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TITLE: Improved proteases and methods for producing them 
FIELD OF INVENTION 
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SUMMARY OF THE INVENTION 

wf, oh after maturabon has protaase activity, which polypeptide when expressed and hi™, 
maturated comprises a l^ogous prc-^gion, and which poivpepbde 

*U. o 99% **- 1= the amine acid sequence ef the mature pad ef the polypeptide 
^tT'soT a a L am j n0 S,qUenCe WhlCh b 3 ' TO% «• - P-ferably 

« I' 87% ' "* S9% - 90% ' 91% ' B3% ' «*. ^. «£ 

h ^ * *• amin0 "* <* *• pert of the polypeptide 

encoded by the polynucleotide in SEQ ID NO: 1; SEQ ID NO" 2 SEQ ,D NOaTsEO 
K, NO: 3,; SEQ ,D NO: 3* SEQ ,0 NO: 3* SEQ ,0 NO: 40; or SE^ NO 44 ' 

IT" ^ "'** b 3 — —•«—*• -to-** having 
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3 «™=««n 9 the malum p,„ „ a protease 

paynudeot.de obtainable from genomic DNA from tvoca*^ 
sub*, dassom** DSM 43235 by use of pdmers SEQ ID 

sL 8: ,D no"" I'' T W0Ca ^ DS M 16647 by use of pnmers 
SEQ id NO* 34 and 35; from Atocan^ pras(na DSM 15s4e 

pnmersSEQ ID NOs: 38 and 3 9 ; or from Noca^a pres/ra DSM 1S6 4g 
by use of primers SEQ ID NO's: 42 and 39- 

<M) ^ZTT Ma °" SEa ' D "* 1; 0,860 IDNO: * * SK > ■ NO: 25; of 

(IV) a complementary strand of (I), (||), or (111)- 

(e) is an allelic variant of (a), (b), (c), or (d); or 

(f) is a fragment of (a), (b), (c), (d), or (f). 
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(d) is encoded by a nucleic add sequence which hybridizes under very low low med.u m 
low. medium, medium-high. high, or very hfch sfringency J£!£ ' 
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a Potynudeot.de encoding the mature part of a protease, said 
porynuc.eo.de obtainable from genomic DNA from Nocartiopsis 
dassonvMe, subsp. dassonviiiei DSM 43235 by use of primers SEQ ID 

SP^nir ^ fr ° m NOCardl '° PSiS DSM 15647 by use of primers 
SEQ ID NO s: 34 and 35; from Nocardiosis prasina DSM 15648 by use of 
pnmers SEQ ID NO's: 38 and 39; orfrom Nocardiopsis prasina DSM 15649 
by use of primers SEQ ID NO's: 42 and 39- 
(H) ^nudeoMe of SEQ ,D NO: 1; of SEQ ,D NO: 2: of SEQ ID NO: 25; of 

3 1 * 10 "* K: °' SE ° ,D NO: 36: * SEQ "> NO: 40; 
or of SEQ ID NO: 44; 

(HI) a subsequence of (I) or (.1, of at ieast 500 nucleoUdes. preferably 400 300 

200,or100nudeoticles,or 
(IV) a complementary strand of (0, (II), or (III)- 

(e) is an allelic variant of (a), (b). (c), or (d); or 
15 <f> is a fragment of (a), (b). (c). (d). or (0. 

Still in a third aspect, the invention relates to a recombinant expression vector or 
polynucteoade cons**, comprising a po^nudoofide aa defined in the previous!^ 
d ^ , h * M ^^ ,oarec ° mbl "««"'«« cell comprising a polynucleotide as 

cons**, as defined in o» „« ~ eXPreS5iCm " * 

or «.emeItereo! PeC, " !*" * " «** - pKrtuds 

orei^ntstt^.co m pns,„gapolynucl.otideasdelinedi„,heseoondaspaot oran 
express™ v«tor o, poiynudeofide consbuc. as defined in .he third aspect 

definedlnTrr * '""^ "> 3 me,hod *» ««»*<*9 a polypeptide as 
tltlr ^ " a ttanS3eniC P,an * " an,ma ' 88 «~* ^ the fifth o, sixth aspect 

(a) at least one fat-soluble vitamin, and/or 

(b) at least one water-soluble vitamin, and/or 

(c) at least one trace mineral; 
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an animal feed composition having a crude protein content of 50 to 800 g/kg and 
comprising at least one polypeptide as defined in the first aspect, or at least one feed additive of 
the previous aspect; 

a composition comprising at least one polypeptide as defined in the first aspect, together 
with at least one other enzyme selected from amongst phytase (EC 3.1.3.8 or 3.1.3.26); xylanase 
(EC 3.2.1.8); galactanase (EC 3.2.1.89); alpha-galactosidase (EC 3.2.1.22): protease (EC 3.4.-.-), 
phospholipase A1 (EC 3.1.1.32); phospholipase A2 (EC 3.1.1.4); lysophospholipase (EC 3.1.1.5)! 
phospholipase C (3.1.4.3); phospholipase D (EC 3.1.4.4); and/or beta-glucanase (EC 32 1 4 or 
EC 3.2.1.6); 

a method for using at least one polypeptide as defined in the first aspect, for improving the 
nutritional value of an animal feed, for increasing digestible and/or soluble protein in animal diets, 
for increasing the degree of hydrolysis of proteins in animal diets, and/or for the treatment of 
vegetable proteins, the method comprising including the polypeptide^) in animal feed, and/or in a 
composition for use in animal feed; 

a method for using at least one polypeptide as defined in the first aspect, comprising 
including the polypeptide^) in a detergent formulation. 

DETAILED DESCRIPTION OF THE INVENTION 

Proteases are classified on the basis of their catalytic mechanism into the following groups: 
Serine proteases (S), Cysteine proteases (C). Aspartic proteases (A). Metalloproteases (M). and 
Unknown, or as yet unclassified, proteases (U), see Handbook of Proteolytic Enzymes 
A.J.Barrett,. N.D.Rawlings, J.F.Woessner (eds), Academic Press (1998). in particular the general 
introduction part. 

Serine proteases are ubiquitous, being found in viruses, bacteria and eukaryotes; they 
include exopeptidase, endopeptidase, oligopeptide and omega-peptidase activity. Over 20 
families (denoted S1 - S27) of serine proteases have been identified, these being grouped into 6 
clans denoted SA, SB. SC. SE. SF, and SO, on the basis of structural similarity and functional 
evidence (Barrett et al. 1998. Handbook of proteolytic enzymes). Structures are known for at least 
four of the clans (SA. SB. SC and SE). these appear to be totally unrelated, suggesting at least 
four evolutionary origins of serine peptidases. Alpha-lytic endopeptidases belong to the 
chymotrypisin (SA) clan, within which they have been assigned to subfamily A of the S2 family 
(S2A). 9 
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Another classification system of proteolytic enzymes is based on sequence information 
and is therefore used more often in the art of molecular biology; it is described in Rawlings, N.D. et 
al., 2002, MEROPS: The protease database. Nucleic Acids' Res. 30:343-346. The MEROPS 
database is freely available electronically at httD://www.mer OP s »r ..t. According to the MEROPS 
system, the proteolytic enzymes classified as S2A in The Handbook of Proteolytic Enzymes', are 
in MEROPS classified as 'S1E' proteases (Rawlings ND, Barrett AJ. (1993) Evolutionary families 
of peptidases, Biochem. J. 290:205-218). 

In particular embodiments, the proteases of the invention and for use according to the 
invention are selected from the group consisting of: 

(a) proteases belonging to the EC 3.4.-.- enzyme group; 

(b) Serine proteases belonging to the S group of the above Handbook; 
(d) Serine proteases of peptidase family S2A; 

(c2) Serine proteases of peptidase family S1 E as described in Biochem.J. 290:205-218 (1 993) 
and in MEROPS a protease database, release 6.20. March 24, 2003. (www.merops.ac : uk). The 
database is described in Rawlings, N.O., O'Brien. E. A. & Barrett. AJ. (2002) MEROPS: the 
protease database. Nucleic Acids Res. 30, 343-346. 

For determining whether a given protease is a Serine protease, and a family S2A protease, 
reference is made to the above Handbook and the principles Indicated therein. Such 
determination can be carried out for all types of proteases, be it naturally occurring or wild-type 
proteases; or genetically engineered or synthetic proteases. 

Protease activity can be measured using any assay, in which a substrate is employed, that 
includes peptide bonds relevant for the specificity of the protease in question. Assay-pH and 
assay-temperature are likewise to be adapted to the protease in question. Examples of assay-pH- 
values are pH 2, 3, 4, 5. 6. 7, 8, 9. 10, 11, or 12. Examples of assay-temperatures are 30, 35, 37, 
40, 45, 50, 55. 60, 65, 70, 80, 90, or 95°C. 

Examples of protease substrates are casein, such as Azurine-Crosslinked Casein (AZCL- 
casein). Two protease assays are described in Example 2 herein, either of which can be used to 
determine protease activity. For the purposes of this invention, the so-called pNA Assay is a 
preferred assay. 

There are no limitations on the origin of the protease of the invention and/or for use 
according to the invention. Thus, the term protease includes not only natural or wild-type 
proteases obtained from microorganisms of any genus, but also any mutants, variants, fragments 
etc. thereof exhibiting protease activity, as well as synthetic proteases, such as shuffled 
proteases, and consensus proteases. Such genetically engineered proteases can be prepared as 
is generally known in the art, eg by Site-directed Mutagenesis, by PCR (using a PCR fragment 
contains the desired mutation as one of the primers in the PCR reactions), or by Random 



Mutagenesis. The preparation of consensus proteins is described in eg EP 897985. The term 
"obtained from" as used herein in connection with a given source shall mean that the polypeptide 
encoded by the nucleic acid sequence is produced by the source or by a cell in which the nucleic 
add sequence from the source is present. In a preferred embodiment, the polypeptide is secreted 
extracellularly. 

In a specific embodiment, the protease is a low-allergenic variant, designed to invoke a 
reduced immunological response when exposed to animals, including man. The term 
immunological response is to be understood as any reaction by the immune system of an animal 
exposed to the protease. One type of immunological response is an allergic response leading to 
. increased levels of IgE in the exposed animal. Low-allergenic variants may be prepared using 
techniques known in the art. For example the protease may be conjugated with polymer moieties 
shielding portions or epitopes of the protease involved in an immunological response. Conjugation 
w.th polymers may involve in vitro chemical coupling of polymer to the protease, e.g. as described 
in WO 96/17929, WO 98/30682, WO 98/35026, and/or WO 99/00489. Conjugation may in addition 
or alternatively thereto involve in vivo coupling of polymers to the protease. Such conjugation may 
be achieved by genetic engineering of the nucleotide sequence encoding the protease, inserting 
consensus sequences encoding additional glycosylation sites in the protease and- expressing the 
•protease in a host capable of glycosylating the protease, see e.g. WO 00/26354. Another way of 
providing low-allergenic variants is genetic engineering of the nucleotide sequence encoding the 
protease so as to cause the protease to self-oligomerize. effecting that protease monomers may 
shield the epitopes of other protease monomers and thereby lowering the antigenicity of the 
oligomers. Such products and their preparation is described e.g. in WO 96/16177. Epitopes 
involved in an immunological response may be identified by various methods such as the phage 
display method described in WO 00/26230 and WO 01/83559, or the random approach described 
in EP 561907. Once an epitope has been identified, its amino acid sequence may be altered to 
produce altered immunological properties of the protease by known gene manipulation techniques 
such as site directed mutagenesis (see e.g. WO 00/26230, WO 00/26354 and/or WO 00/22103) 
and/or conjugation of a polymer may be done in sufficient proximity to the epitope for the polymer 
to shield the epitope. 

The first aspect of the invention relates to a secreted mature polypeptide which after 
maturation has protease activity, which polypeptide when expressed and before maturation 
comprises a heterologous pro-region, and which polypeptide: 

(a) comprises an amino acid sequence which is at least 70% identical, %, or preferably 
75%, 80%, 85%, 86%, 87%. 88%, 89%, 90%, 91%, 92%, 93%. 94%, 95%. 96%. 97%, 
98%, or 99% identical to the amino acid sequence of the mature part of the polypeptide 
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shown in SEQ ID NO: 28; SEQ ID NO: 33; SEQ ID NO: 37; SEQ ID NO: 41; SEQ ID 
NO: 43; or SEQ ID NO: 45; 

(b) comprises an amino acid sequence which is at least 70% identical, %, or preferably 
75%, 80%, 85%, 86%, 87%. 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 
98%, or 99% identical to the amino add sequence of the mature part of the polypeptide 
encoded by the polynucleotide in SEQ ID NO: 1; SEQ ID NO: 2; SEQ ID NO: 25; SEQ 
ID NO: 31; SEQ ID NO: 32; SEQ ID NO: 36; SEQ ID NO: 40; or SEQ ID NO: 44; 

(c) comprises a mature part which is a variant of the mature part of the polypeptide having 
the amino acid sequence of SEQ ID NO: 28; SEQ ID NO: 33; SEQ ID NO: 37; SEQ ID 
NO: 41; SEQ ID NO: 43; or SEQ ID NO: 45, the segment comprising a substitution, 
deletion, extension, and/or insertion of one or more amino acids; 

(d) is encoded by a nucleic acid sequence which hybridizes under very low, low. medium- 
low, medium, medium-high, high, or very high stringency conditions with: 

(I) a polynucleotide encoding the mature part of a protease, said 
polynucleotide • obtainable from genomic DNA from Nocardiopsis 
dassonvillei subsp. dassonvillei DSM 43235 by use of primers SEQ ID 
NO's: 26 and 27; from Nocardiopsis Alba DSM 15647 by use of primers 
SEQ ID NO's: 34 and 35; from Nocardiopsis prasina DSM 15648 by use of 
primers SEQ ID NO's: 38 and 39; or from Nocardiopsis prasina DSM 15649 
by use of primers SEQ ID NO's: 42 and 39; 

(II) the polynucleotide of SEQ ID NO: 1 ; of SEQ ID NO: 2; of SEQ ID NO: 25; of 
SEQ ID NO: 31; of SEQ ID NO: 32; of SEQ ID NO: 36; of SEQ ID NO: 40; 
or of SEQ ID NO: 44; 

(III) a subsequence of (I) or (II) of at least 500 nucleotides, preferably 400, 300, 
•200, or 100 nucleotides.-, or 

(IV) a complementary strand of (I), (II), or (III); 

(e) is an allelic variant of (a), (b), (c), or (d); or 

(f) is a fragment of (a), (b), (c), (d), or (f). 

For the purposes of the present invention, the degree of identity between two amino acid 
sequences, as well as the degree of identity between two nucleotide sequences, is determined by 
the program "align" which is a Needleman-Wunsch alignment (i.e. a global alignment). The 
program is used for alignment of polypeptide, as well as nucleotide sequences. The default 
scoring matrix BLOSUM50 is used for polypeptide alignments, and the default identity matrix is 
used for nucleotide alignments. The penalty for the first residue of a gap is -12 for polypeptides 
and -16 for nucleotides. The penalties for further residues of a gap are -2 for polypeptides, and -4 
for nucleotide. 
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"Align" Is part of the FASTA package vereion v20u6 (see W R Pearson end r> i , • 
(1988). ..proved Tools for Blofcgica, Sequel A„a*si PNAS art. J^T 

Pearson (1990) ^ >(ld SensMus Sequenoe ^ ^^STA . M 12 

ST"' T^' FASTA Pr0,e ' n a ' i9nmCn,S " *• ^Weterrean^ It 

Clustel ^22 ^ ° , . iden,, '' y betoee " •» -*• — may a,so be determined by the 

Clustel method (Higgins. 1989. CABIOS 5: 151-153) using the LASERGENE™ MEGALIGN™ 
software (DNASTAR. Inc ., Madison , W|) ^ aR ^ BJ»»UQN^ 

. Parameters: Gep penally o, 10. and gap tength pane,,, o, 10. Peirw.se allgnmen. palateZe 
KUrple-1. gap penal.y-3, whdows-5. end diagonals. The degree of IdenWy betw^TZ 

estr r e,KeS ^ te dete ' mlned ^ *" — *"■"> « -*l^ge as 

Pa»w,se alrgnment parameters era Ktuple=3, gap penelty=3 and windows=20 

A fragment of one of the encoding polynucleotide sequences of the invention is a 
polynucleobde which encodes a pdypepUde having one or mo™ amino adds detect! the 
an.no andtor carboxy tem(nU5 compare, to the length amino acid agl- Tn 1 

readuea, or at least 125 am,no odd residues, or at least 150 amino add residues or at least 160 
= d residues, or e, leas, 155 amino add residues, or at leas, 170 amino ado^ l 
at least 175 amino acid residues. ^ or 

An allelic variant denotes any of two or more alternative forms of a gene occuovino ,h» 
same chromosome, Iocs. * t varied artses na,ura»y through mutoL ZZ^ 
poym^sm wrfhin populate. Gene muWons can be „en, (no chenge inZen^ded 

of a polypept.de ,s a polypeptide encoded by an allelic valiant of a gene 

which am e .„ P ^T! r 8 "" 0 " a ' S0 ,0 b ° lated ^PapUdes having pmteese adivKy and 

which am encoded by nudeic add sequences which hybridize under very low. or low „Z 

IT^T" 1 TT^ hl9h ' ° r ™> '* «— * a n d„c L pit 

Hirr^ C ° nOHi0nS "* 3 ancoding the mate,* part T. 

W ID « n N °" S: 26 271 ,r0m °SM 1^ by use o, 

£Z 38 Z * ?! " "! ^ «- 15646 by use of pnmem SEQ 

Z m <m r» * ^ *— DSM 18«9 by use of primers SEQ ID NO'S: 42 

NO. 31. SEQ , D no: 3 2 : of SEQ .D NO: 36: of SEQ ID NO: 40: or of SEQ ID NO: 44: (II.) a 



subsequence of (I) or (I.) of at least 500 nucleotides, preferably 400 300 200 or inn , 
or (IV, a commentary strand of („, („,, or (,„) g. SambrooK E Tl2h^ t ^T^ 
Moiecu,er aonina, A Ut^tory Manua/, 2nd edtoon. Co,d S P n^TJ^T\ 
particular embodiment the nucleic acid probe is selects T } ' 006 

sequences of M oa «r , . J fr ° m amon9st tne nudei c acid 

sequences of (a), (b), or (c) above. A polynucleotide corresponding to the mature Dentin* 

: s^^T^^•^^--^- , = l •■= , ; 

M. DSM 15647 by use of primers SEQ >D NO'S- 34 J 35 L m ? ' Naemt ^' 

°uiu oe at least 15, preferably at least 25. and more preferably at least 35 m iHantin • 
tenolh. Longer probes can a,so be used. Bolh DMA and RNA probes 1 J ^ ^Tne aT 

po^pbde having proteaS e aCvity. Genomic or other DNA from such o^ o^anlZTba 

ZZZ " P '* aC,y,am " ie ^ - — -parat^nT^C 

from the .branes or lb. separated DNA may be transfened to and immobile on nibZlset 

other suable oanier mateHa,. In onfer to ,den % a olone or DNA which is hooT^T! io 
comp.emenfc.y sband. or a subseouence theteof. under ve* ,ow to very hfcb 
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conditions. Molecules to which the nucleic acid probe hybridizes under these conditions are 
detected using X-ray film. 

For long probes of at least 100 nucleotides in length, very low to very high stringency 
conditions are defined as prehybridization and hybridization at 42°C in 5X SSPE, 0.3% SDS, 200 
ug/ml sheared and denatured salmon sperm DMA, and either 25% formamide for very low and low 
stringencies. 35% formamide for medium and medium-high stringencies, or 50% formamide for 
high and very high stringencies, following standard Southern blotting procedures. 

For long probes of at least 100 nucleotides in length, the carrier materia! is finally washed 
three times each for 15 minutes using 0.2 x SSC. 0.2% SDS. 20% formamide preferably at least at 
• 45'C (very low stringency), more preferably at least at 50'C (low stringency), more preferably at 
least at 55°C (medium stringency), more preferably at least at 60°C (medium-high stringency), 
even more preferably at least at 65°C (high stringency), and most preferably at least at 70°C (verji 
high stringency). 

For short probes about 15 nucleotides to about 70 nucleotides in length, stringency 
conditions are defined as prehybridization, hybridization, and washing post-hybridization at 5"C to 
10'C below the calculated T m using the calculation according to Bolton and McCarthy (1962; 
Proceedings of the National Academy of Sciences USA 48:1300) in 0.9 M NaCI, 0.09 M Tris-HCI 
P H-7:6. 6 mMEDTA, 0.5% NP-40, 1X Denhardt's solution, 1 mM sodium pyrophosphate. 1 mM" 
sodium monobasic phosphate, 0.1 mM ATP, and 0.2 mg of yeast RNA per ml following standard 
Southern blotting procedures. 

For short probes about 15 nucleotides to about 70 nucleotides in length, the carrier 
■material is washed once in 6X SSC plus 0.1% SDS for 15 minutes and twice each for 15 minutes 
using 6X SSC at 5°C to 10°C below the calculated T m . 

The present invention also relates to variants of the polypeptide of the invention, 
comprising a substitution, deletion, and/or insertion of one or more amino acids. 

In a particular embodiment, amino acid changes are of a minor nature, that is conservative 
amino acid substitutions that do not significantly affect the folding and/or activity of the protein; 
small deletions, typically of one to about 30 amino acids; small amino- or carboxyl-terminal 
extensions, such as an amino-terminal methionine residue; a small peptide of up to about 20-25 
residues; or a small extension that facilitates purification by changing net charge or another 
function, such as a poly-histidine tract, an antigenic epitope or a binding domain. 

Examples of conservative substitutions are within the group of basic amino acids (arginine. 
lysine and histidine). acidic amino acids (glutamic acid and aspartic acid), polar amino acids 
(glutamine and asparagine), hydrophobic amino acids (leucine, isoleucine and valine), aromatic 
amino acids (phenylalanine, tryptophan and tyrosine), and small amino acids (glycine, alanine, 
serine, threonine and methionine). Amino acid substitutions which do not generally alter the 



specrflc activity are known in the art and are described, for example, by H. Neurath and R L Hill 
1979, /„, The Proteins, Academic Press. New York. The most commonly occurring exchanges are 
Ala/Ser. Val/lle, Asp/Glu, Thr/Ser, Ala/Gly. Ala/Thr. Ser/Asn. AlaA/al. Ser/Gly. Tyr/Phe Ala/Pro 
Lys/Arg. Asp/Asn, Leu/lle, LeuWal. Ala/Glu. and Asp/Gly as well as these in reverse 

In a particular embodiment, the polypeptides of the invention and for use according to the 
invention are acid-stable. For the present purposes, the term acid-stable means that the residua, 
activity after 2 hours of incubation at P H 3.0 and 37-C. is at least 50%. as compared to the 
res,dual activity of a corresponding sample incubated for 2 hours at pH 9.0 and 5»C. In a particular 
^ embodiment, the residual activity is at least 60%. 70%. 80% or at least 90%. 

In particular embodiments, the polypeptide of the invention is i) a bacterial protease- ii) a 
protease of the phylum Actinobacteria; Hi) of the class Actinobacteria; iv) of the order 
Actmomycetales v) of the family Nocardiopsaceae; vi) of the genus Nocardiopsis; and/or a 
protease derived from vii) Nocardiopsis species such as Nocardiopsis alba, Nocardiopsis 
an arctica, Nocardiopsis prasina, composta, exbalans, haiopbila, balotoierans. kunsanensis 
l.sten, lucentensis, metailicus. synnemataformans, trehatosi. tropica, umidischolae, xinjiangensis 
or Nocardtopsis dassonvilfei, for example Nocardiopsis dassonvillei DSM 43235. 

•• * The above taxonomyis according to the chapter The road map to the Manual by G M 
Gamty & J. G. Holt in Sergey's Manual' of Systematic Bacteriology. 2001, second edition, volume 
1 , David R. Bone, Richard W. Castenholz. 

It will be understood that for the aforementioned species, the invention encompasses both 
the perfect and imperfect states, and other taxonomic equivalents, e.g., anamorphs, regardless of 
the species name by which they are known. Those skilled in the art will readily recognize the 
identity of appropriate equivalents. 

Strains of these species are readily accessible to the public in a number of culture 
coHectons, such as the American Type Culture Collection (ATCC), Deutsche Sammlung von 
Mikroorganismen und ZeHkulturen GmbH (DSM), Centraalbureau Voor Schimmelcultures (CBS) 
and Agricultural Research Service Patent Culture Col.ection, Northern Regional Research Center 
(NRRL). E.g., Nocardiopsis dassonvillei subsp. dassonvillei DSM 43235 is publicly available from 
DSM2 (Deutsche Sammlung von Mikroorganismen und ZeHkulturen GmbH. Braunschweig 
Germany). The strain was also deposited at other depositary institutions as follows- ATCC 23219* 
IMRU 1 250, NCTC 1 0489. 

Furthermore, such polypeptides may be identified and obtained from other sources 
including microorganisms isolated from nature (e.g., soil, composts, water, etc.) using the above- 
merttoned probes. Techniques for iso.ating microorganisms from natural habitats are well known 

'hm! T, nUdelC S6qUenCe ^ be d6riVed by Siml,ari * screenin 9 * gnomic or 
" brary ° f another "^organism. Once a nucleic acid sequence encoding a polypeptide 
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has been detected with the probe(s). the sequence may be isolated or cloned by utilizing 
techniques which are known to those of ordinary skill in the art (see, e.g.. Sambrook er a/., 1989. 
supra). 

As defined herein, an "isolated" polypeptide is a polypeptide which is essentially free of 
other polypeptides, e.g., at least about 20% pure, preferably at least about 40% pure, more 
preferably about 60% pure, even more preferably about 80% pure, most preferably about 90% 
pure, and even most preferably about 95% pure, as determined by SDS-PAGE. 

Polypeptides encoded by nucleic acid sequences of the present invention also include 
^ fused polypeptides or cleavable fusion polypeptides in which another polypeptide is fused at the 
N-terminus or the C-terminus of the polypeptide or fragment thereof. A fused polypeptide is 
produced by fusing a nucleic acid sequence (or a portion thereof) encoding another polypeptide to 
a nucleic acid sequence (or a portion thereof) of the present invention. Techniques for producing 
fusion polypeptides are known in the art, e.g. PCR, or ligating the coding sequences encoding the 
polypeptides so that they are in frame and that expression of the fused polypeptide is under 
1 5 control of the same promoters) and terminator. 

In the present context, non-polar amino acids are G, A, V, L, I, M, P, F or W; and 
. uncharged polar amino acids are S. T, N, Q, Y, og C. The terms "non-polar" and "uncharged. polar" 
* when used to describe amino acids in a polypeptide are generally recognized in the art as 
characterizing the side-chain of the amino acid. Hence, for instance, the free carboxylic acid of the 
c-terminal amino acid in a polypeptide is not considered when determining whether this amino 
acid is a non-polar or uncharged polar amino acid. 

A preferred embodiment releates to a polypeptide of the first aspect which mature part is a 
wildtype polypeptide; an artificial variant of a wildtype polypeptide said variant having one or more 
amino-acid(s) added to the C-terminus as compared to the wildtype and preferably the one or 
more added amino acid(s) is (are) non-polar or uncharged and even more preferably the one or 
more added amino acid(s) is one or more of Q, S, V, A, or P; a shuffled polypeptide; or a protein- 
engineered polypeptide. 

Yet another preferred embodiment relates to a polypeptide of the first aspect, which 
comprises at least three non-polar or uncharged polar amino acids within the last four amino acids 
30 of the C-terminus of the polypeptide; 

Another preferred embodiment relates to a polypeptide of the first aspect, wherein the one 
or more added amino acids are selected from the group consisting of: QSHVQSAP, QSAP QP 
TL, TT, QL, TP, LP, Tl, IQ, QP. P|, LT, TQ, IT, QQ, and PQ. 

The inventors determined, that the polypeptides of the present invention were produced in 
even greater yields when they were expressed as mature proteases fused to a heterologous pro- 
region, and/or fused to a heterologous secretion signal peptide, as shown in the examples below. 
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Accordingly, a preferred embodiment relates to the polypeptide according to the first 
aspect which when expressed and before maturation comprises a heterologous secretion signat- 
peptide which is cleaved from the polypeptide when the polypeptide is secreted, preferably the 
heterologous secretion signal peptide is derived from a heterologous protease, preferably the 
heterologous secretion signal peptide comprises an amino acid sequence having a sequence 
identity of at least 70%, or preferably 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90% 91% 92% 
93%. 94%, 95%, 96%, 97% 98%, or 99%, with the amino acid sequence encoded by 
polynucleotides 1 - 81 of SEQ ID NO: 2, or SEQ ID NO: 44. 

Accordingly, a preferred embodiment relates to the polypeptide according to the first 
. aspect which when expressed and before maturation comprises a heterologous pro-region from a 
protease; preferably the pro-region is derived from an S2A or S1E protease, more preferably the 
pro-region is encoded by a polynucleotide which hybridizes under very low, low, medium-low, 
medium, medium-high, high, or very high stringency conditions with a polynucleotide encoding the 
pro-region shown in position -166 to -1 of SEQ ID NO: 28, in position 1-166 of SEQ ID NO- 30 in 
position -167 to -1 of SEQ ID NO: 33, in position -165 to -1 of SEQ ID NO: 37. in position -165 to - 
1 of SEQ ID NO: 41 , in position -165 to -1 of SEQ ID NO: 43. in position -165 to -1 of .SEQ ID NO- 
45, in position 1-165 of SEQ ID NO: 46. in position 1-166 of SEQ ID NO: 47. in position 1-166 of 
• SEQ 1D NO- 48, in position 1-166 of SEQ* ID NO: 49. in position 1-166 of SEQ ID NO- 50 in- 
position 1-165 of SEQ ID NO: 51 , in position 1-166 of SEQ ID NO: 52, or in position 1-166 of SEQ 
ID NO: 53; and most preferably it is at least 70% identical, or preferably 75%. 80% 85% 86% 
87%. 88%, 89%, 90%. 91%. 92%, 93%, 94%, 95%, 96%. 97%, 98%, or 99%, identical to the pro^ 
reg,on shown in position -166 to -1 of SEQ ID NO: 28, in position 1-166 of SEQ ID NO- 30 in 
position -167 to -1 of SEQ ID NO: 33. in position -165 to -1 of SEQ ID NO: 37. in position -165 to - 
1 of SEQ ID NO: 41, in position -165 to -1 of SEQ ID NO: 43, in position -165 to -1 of SEQ ID NO- 
45. in position 1-165 of SEQ ID NO: 46, in position 1-166 of SEQ ID NO: 47, in position 1-166 of 
SEQ ID NO: 48. in position 1-166 of SEQ ID NO: 49. in position 1-166 of SEQ ID NO- 50 in 
position 1-165 of SEQ ID NO: 51. in position 1-166 of SEQ ID NO: 52. or in position 1-166 of SEQ 
ID NO: 53. 

Nucleic Acid Sequencea 

The present invention also relates to isolated nucleic acid sequences that encode a 
polypeptide of the present invention. Particular nucleic acid sequences of the invention are the 
polynucleotides of SEQ ID NO: 1. SEQ ID NO: 2. SEQ ID NO: 25. SEQ ID NO: 31. SEQ ID NO- 
32. SEQ ID NO: 36. SEQ ID NO: 40. or SEQ ID NO: 44. Another particular nucleic acid sequence 
of the invention is the sequence, preferably the mature polypeptide encoding region thereof, which 
.s obtainable from genomic DNA from Nocardiopsis dassonvillei subspecies dassonviHei DSM 



43235. The present invenbon also encompasses nudeic aoid sequences which encode a 
polypephde having the amino add sequence of amino adds shown in posHions 1 to 168 or 
pos*ons -,65 te m „ SEQ ,D NO: 43, which nucleic add sequences di«er from he 
corresponding parts of SEQ ID NO: 1 by virtue of the degeneracy of the 9 enetic code. The present 

Z2 t°,r" 10 °"*»" ^ — — > encode pdv^Z 

fragments that have protease activity. 

n„ d ,C ^"r 8 °' 8 **™*<>»* <» a nucleic aoid sequence from which one or more 

,o - 2 ™ '«'«•*•. more Preferably at least 300 nucleotides, even more preferably at least 375 
.0 . 450 500. 531, 600. 700. 800. 900 or ,000 nudeotldes. The present invendon L J 
nudeobda sequences which have a degree or Identity ,o the polynucleodde of SEQ ID NO- 1 

99% * * ^ 85% ' * 87 ' 88 ' * 90 ' * 93 ' »*• 95 ' * »■ * or a, leasi 

15 ' „ ^ t6C T' qUeS " Sed '° 180,8,6 ° r d0ne 3 nud * acM ancoding a Polypeptide 

■ 222T". « ^ " ** ^ SeqUenC6S °' *• Presenl 

genomic DNA- can be effected,..* , by using me well Known polymerese chain reaction (PGR) or 

20 22 sT™" 9 T eXPr8SS ' 0n libraneS ,0 ^ d ° ned DNA * — «■ 

2^22" In A *~ " Mem0 * Acedemic Press, 

2112 ?T1 amP " r,Ca,,0n ProC8dUreS "* 38 ll9ass 'hem naaotion (LCR), 

^ated aebvated transition (IAT) and nudeic add sequenced ampHncadon (NASBA) may 

5 21T 2 • ™* ^ b8 88 a ' WC ° r I— -** of me poiypepdd. 
5 encoding region of the nudeic add sequence. 

which , ^ 'T ;! S ! 8,ed nUCte ' C ^ 88qU8nCe " 88 US8d here,n 'o a nudeic add sequence 
when . essenhaily free of omer nudeic add sequences, e.g., a, teas, about 20% pure, pjferabty 
a. ieas, about 40% pure, more preferebty a. leas, about 60% pure, even more p4ra£ a, 2 

I 22T "* ^ 81 18881 *- - as detemLd by aga2 

I e ecdophores,s. For exampte. an iaolated nudeic add sequence can be obtained by eLerd 
domng procedures used in genetic engineering to relocate ,h. nudeic add sequence from its 
nature! iocadon to a different site where It will be reproduced. The cloning procedures may involve 
exaaon and .sofetion of a desired nucleic add fragment composing the nudeic add sequence 

re'^2m2? PeP,lde ' inSertl0n * *" ' r8flm8n, **° 8 and inecrporadon of 

me recombman. vector into a host cell where multiple copies „ dones of the nudeic add 



sequence will be replicated. The nudeic ackTsequence may be of genomic, cDNA, RNA, 
semisynthetic, synthetic origin, or any combinations thereof. 

Modification of a nudeic acid sequence encoding a polypeptide of the present invention 
may be necessary for the synthesis of polypeptides substantially similar to the polypeptide The 
term "substantially similar" to the polypeptide refers to non-naturally occurring forms of the 
polypeptide. These polypeptides may differ in some engineered way from the polypeptide isolated 
from its native source, e.g., variants that differ in specific activity, thermostability, pH optimum 
allergenic^, or the like. The variant sequence may be constructed on the basis of the nucleic acid 
sequence presented as the polypeptide encoding part of the polynucleotides of the invention e g 
• a subsequence thereof, and/or by introduction of nucleotide substitutions which do not give rise to 
another amino acid sequence of the polypeptide encoded by the nucleic acid sequence, but which 
correspond to the codon usage of the host organism intended for production of the protease, or by 
introduction of nucleotide substitutions which may give rise to a different amino acid sequence 
For a general description of nucleotide substitution, see, e.g., Ford ef a/., 1991 Protein 
Expression and Purification 2: 95-107. Low-allergenic polypeptides can e.g. be prepared as 
described above. 

• It will be apparent to those skilled in the art that such substitutions can be made outside 
the reg,ons critical to the function of the molecule and still result in an active polypeptide. Amino 
acd residues essential to the activity of the polypeptide encoded by the Isolated nucleic acid 
sequence of the invention, and therefore preferably not subject to substitution, may be identified 
according to procedures known in the art, such as site-directed mutagenesis or alanine-scanning 
mutagenesis (see, e.g., Cunningham and Wells, 1989, Science 244: 1081-1085) In the latter 
technique, mutations are introduced at every positively charged residue in the molecule, and the 
resultant mutant molecules are tested for protease activity to identify amino acid residues that are 
cntical to the activity of the molecule. Sites of substrate-protease interaction can also be 
determined by analysis of the three-dimensional structure as determined by such techniques as 
nuclear magnetic resonance analysis, crystallography or photoaffinity labelling (see. eg de Vos 
era/., 1992, Science 255: 306-312; Smith etaL, 1992, Journal of Molecular Biology 224- 899-904- 
Wlodaver et a/., 1992, FEBS Letters 309: 59-64). 
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exist in nature. The term nucleic add construct is synonymous with the term expression cassette 
when the nucleic acid construct contains all the control sequences required for expression of a 
coding sequence of the present invention. The term "coding sequence" is defined herein as a 
nucleic acid sequence that directly specifies the amino acid sequence of its protein product. The 
boundaries of the coding sequence are generally determined by a ribosome binding site 
(prokaryotes) or by the ATG start codon (eukaryotes) located just upstream of the open reading 
frame at the 5' end of the mRNA and a transcription terminator sequence located just downstream 
of the open reading frame at the 3' end of the mRNA. A coding sequence can include, but is not 
limited to, DNA cDNA, and recombinant nucleic acid sequences. 

An isolated nucleic acid sequence encoding a polypeptide of the present invention may be 
manipulated in a variety of ways to provide for expression of the polypeptide. Manipulation of the 
nucleic acid sequence prior to its insertion into a vector may be desirable or necessary depending 
on the expression vector. The techniques for modifying nucleic acid sequences utilizing 
recombinant DNA methods are well known in the art. 

The term "control sequences" is defined herein to include all components that are 
necessary or advantageous for the expression of a polypeptide of the present invention. Each 
control sequence may be native or foreign to the nucleic-acid sequence encoding the polypeptide- 
Such control sequences include, but are not limited to, a leader, polyadenylation sequence 
propeptide sequence, promoter, signal peptide sequence, and transcription terminator. At a 
minimum, the control sequences include a promoter, and transcriptional and translational stop 
.signals. The control sequences may be provided with linkers for the purpose of introducing 
specific restriction sites facilitating ligation of the control sequences with the coding region of the 
nucleic acid sequence encoding a polypeptide. The term "operably linked" is defined herein as a 
configuration in which a control sequence is appropriately placed at a position relative to the 
cod.ng sequence of the DNA sequence such that the control sequence directs the expression of a 
polypeptide. 

The control sequence may be an appropriate promoter sequence, a nucleic acid sequence 
that is recognized by a host cell for expression of the nucleic acid sequence. The promoter 
sequence contains transcriptional control sequences that mediate the expression of the 
polypeptide. The promoter may be any nucleic acid sequence which shows transcriptional activity 
in the host cell of choice including mutant, truncated, and hybrid promoters, and may be obtained 
from genes encoding extracellular or intracellular polypeptides either homologous or heterologous 
to the host cell. 

Examples of suitable promoters for directing the transcription of the nucleic acid constructs 
of the present invention, especially in a bacterial host cell, are the promoters obtained from the E. " 
coli lac operon. Streptomyces coeticolor agarase gene (dagA), Bacillus subtilis levansucrase gene 



(sacB). Backus to** alpha-amylase gene (amyL), BeCllus slearemem^Mus maltogenic 
amylase gene <amyM>, Sac»us alpha-amylase gene (amyQ, Sac*s 

***** pen«nase gene (penP, aacfffus sums xy* and xylB genes! anfpro^ 
s (W "' a ^ maro ' f 61 *- ™* ^^ngs of me National Academy of 

.^nT ^ ! 3727 " 373U as 88 *■ ,ac <0aBoer * i 2 

the MM Academy of Sciences USA 8th 21-25). Further promoters are described in "Useful 

rT«o r6C0mb " >a '" baaer ' a " 1980. 242: 74-94; and to SambrooKet 

ai. a iyo9, supra. 

6 * sroples <" sulteblB P'o^ers for directing the transcription of the nucleic acid constructs 
• of the pres erri invention in a filamentous fungal host cell are promoters obtained from the genes 

ZlTTT °T TAKA amylaSe - *'—»"*»•' 1-* P">einese. Aspens ntger 
neutral atph^mylase. Aspens nfger acid stable alpha-amylase. Aape*,» J nige/or 
Aspergrf/us a*™** glucoamyiase (gfaA). Rrifeomucormfehe/ lipase, Aape^fus o^ae alkaline 
protease. Aspens o^ee triose phosphate isomarasa. Aape^,s ^u/ana acetamldase and 
Fussnum oKyspprun, trypsMKe protease (WO 96,00787,. as wdl as the NA2-tpi pramoter (a 
hybrid of the promoter from the genes for Aspens niger neutral alpha.amy.as. and 

°T a Wose 'TTT""* and mu,art ' """^ and hybrid P"™ 0 "" 

In a yeast host, useful promoters are obtained from the genes for Sacchammyces 
cerewsrae enolase (ENO-,,. Saocha^cea cereWae galactokmas. (GAL1), Seccftammjcaa 
«>re«s,ae alcohol dehydrogenasefglyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP, end 
S **T Ce,B ^ ^hosphog.ycera.e Kinase. Omer usefu, promoters for yeast host'cells 
are descnbed by Romanos ef at. 1992, Yeasf 8: 423-488. 

The control sequence may also be a suitable transcription terminator sequence a 
sequence : recognised by a host c* to terminate transcription. The terminator sequence is 
operably kM to me y temsnus of the nucleic acid sequence encoding the polypepL. Any 
fermmator wh,ch is functional in the host cel. of choice may be used in the present inIL 

Preferred terminators for tilamemous fungal host cells era obtained from the genes for 
Asperg„,us oo*ae TAKA amylase. Aspergillus mger glucoamyiase. Aspergillus nIMans 
pralTe ****** « «~*» frypJZ 

v^Zm te " ninatorS yeaS * h ° 8 ' C6llS are obK * nKi from *• Sdnes for Secchamm^ 
~e enolase. Secoha^yces cerate cytochmm. C (CYC1). and Sacchamm^s 

cells are descnbed by Romanos e/ a/., 1992, st/p/a. 
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the expressed polypeptide into the secretory pathway of a host cell of choice may be used in the 
present invention. 

Effective signal peptide coding regions for bacterial host cells are the signal peptide coding 
regions obtained from the genes for Bacillus NCIB 11837 myogenic amylase, Bacillus 
steamthermophilus alpha-amylase, Bacillus licheniformis subtilisin, Bacillus licheniformis alpha- 
amylase. Bacillus stearothermophilus neutral proteases (nprT, nprS, nprM), and Bacillus subtilis 
prsA. Further signal peptides are described by Simonen and Palva, 1993, Microbiological 
Reviews 57: 109-137. 

Effective signal peptide coding regions for filamentous fungal host cells are the signal 
: peptide coding regions obtained from the genes for Aspergillus oryzae TAKA amylase, Aspergillus 
niger neutral amylase. Aspergillus niger glucoamylase, Rhizomucor miehei aspartic proteinase, 
Humicola insolens cellulase. and Humicola lanuginosa lipase. 

Useful signal peptides for yeast host cells are obtained from the genes for Sacchammyces 
cerevisiae alpha-factor and Sacchammyces cerevisiae invertase. Other useful signal peptide 
coding regions are described by Romanos et al. , 1 992, supra. 

• The control sequence may also be a propeptide coding region that codes for an amino acid 
sequence positioned al the amino.terminus of a polypeptide. The resultant polypeptide is known 
as<a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is generally 
inactive and can be converted to a mature active polypeptide by catalytic or autocatalytic cleavage 
of the propeptide from the propolypeptide. The propeptide coding region may be obtained from the 
genes for Bacillus subtilis alkaline protease (aprE), Bacillus subtilis neutral protease (nprT), 
Sacchammyces cerevisiae alpha-factor, Rhizomucor miehei aspartic proteinase, and 
Myceliophthora thermophila laccase (WO 95/33836). 

Where both signal peptide and propeptide regions are present at the amino terminus of a 
polypeptide, the propeptide region is positioned next to the amino terminus of a polypeptide and 
the signal peptide region is positioned next to the amino terminus of the propeptide region. 

It may also be desirable to add regulatory sequences which allow the regulation of the 
expression of the polypeptide relative to the growth of the host cell. Examples of regulatory 
systems are those which cause the expression of the gene to be turned on or off in response to a 
chemical or physical stimulus, including the presence of a regulatory compound. Regulatory 
systems in prokaryotic systems include the lac, tac, and trp operator systems. In yeast, the ADH2 
system or GAL1 system may be used. In filamentous fungi, the TAKA alpha-amylase promoter. 
Aspergillus niger glucoamylase promoter, and Aspergillus oryzae glucoamylase promoter may be 
used as regulatory sequences. Other examples of regulatory sequences are those which allow for 
gene amplification. In eukaryotic systems, these include the dihydrofolate reductase gene which is 
amplified in the presence of methotrexate, and the metallothionein genes which are amplified with 



heavy metals. In these cases, the nucleic acid sequence encoding the polypeptide would be 
operably linked with the regulatory sequence. 



Expression Vectors 

The present invention also relates to recombinant expression vectors comprising a nucleic 
acid sequenced the present invention, a promoter, and transcriptional and transiational stop 
signals. The various nucleic acid and control sequences described above may be joined together 
to produce a recombinant expression vector which may include one or more convenient restriction 
sites to allow for insertion or substitution of the nucleic acid sequence encoding the polypeptide at 
. such sites. Alternatively, the nucleic acid sequence of the present invention may be expressed by 
inserting the nucleic acid sequence or a nucleic acid construct comprising the sequence into an 
appropriate vector for expression. In creating the expression vector, the coding sequence is 
located in the vector so that the coding sequence is operably linked with the appropriate control 
sequences for expression. 

The recombinant expression vector may be any vector (e.g., a plasmid or virus) which can 
be conveniently subjected to recombinant DNA procedures and can bring about the expression of 
the nudeic acid sequence. The choice of the vector will typically depend on the compatibility of the 
vector with the host cell into which the vector is to be introduced. The vectors may bellnear or 
closed circular plasmids. 

The vector may be an autonomously replicating vector. i.e., a vector which exists as an 
extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., 
a plasmid, an extrachromosomal element, a minichromosome. or an artificial chromosome. The 
vector may contain any means for assuring self-replication. Alternatively, the vector may be one 
which, when introduced into the host cell, is integrated into the genome and replicated together 
with the chromosome(s) into which it has been integrated. Furthermore, a single vector or plasmid 
or two or more vectors or plasmids which together contain the total DNA to be introduced into the 
genome of the host cell, or a transposon may be used. 

The vectors of the present invention preferably contain one or more selectable markers 
which permit easy selection of transformed cells. A selectable marker is a gene the product of 
which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to 
auxotrophs. and the like. Examples of bacterial selectable markers are the del genes from Bacillus 
subtilis or Bacillus licheniformis. Suitable markers for yeast host cells are ADE2, HIS3, LEU2, 
LYS2, MET3, TRP1, and URA3. Selectable markers for use in a filamentous fungal host cell 
include, but are not limited to, amdS (acetamidase). argB (ornithine carbamoyltransferase), oar 
(phosphinothricin acetyltransferase), hygB (hygromycin phosphotransferase), niaD (nitrate 
reductase), pyrG <orotidine-5-phosphate decarboxylase). sC (sulfate adenyltransferase). trpC 
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(anthranilate synthase), as well as equivalents thereof. Preferred for use in an Aspergillus cell are 
the amdS and pyrG genes of Aspergillus nidulans or Aspergillus oryzae and the oar gene of 
Streptomyces hygroscopicus. 

The vectors of the present invention preferably contain an erfement(s) that permits stable 
integration of the vector into the host cell's genome or autonomous replication of the vector in the 
cell independent of the genome. 

For integration into the host cell genome, the vector may rely on the nucleic acid sequence 
encoding the polypeptide or any other element of the vector for stable integration of the vector into 
the genome by homologous or nonhomologous recombination. Alternatively, the vector may 
contain additional nucleic acid sequences for directing integration by homologous recombination 
into the genome of the host cell. The additional nucleic acid sequences enable the vector to be 
integrated into the host cell genome at a precise locations) in the chromosome(s). To increase 
the likelihood of integration at a precise location, the integrations elements should preferably 
contain a sufficient number of nucleic acids, such as 100 to 1,500 base pairs, preferably 400 to 
1,500 base pairs, and most preferably 800 to 1,500 base pairs, which are highly homologous with 
the corresponding target sequence to enhance the probability of homologous recombination. The 
;integrational elements may be any sequence that is homologous with the target sequence in the 
genome of the host cell. Furthermore, the Integrational elements may be non-encoding or 
encoding nucleic acid sequences. On the other hand, the vector may be integrated into the 
20 genome of the host cell by non-homologous recombination. 

For autonomous replication, the vector may further comprise an origin of replication 
enabling the vector to replicate autonomously in the host cell in question. Examples of bacterial 
origins of replication are the origins of replication of plasmids pBR322. pUC19. pACYC177, and 
PACYC184 permitting replication in £ co//, and pUB110, pE194, pTA1060, and pAMB1 permitting 
replication in Bacillus. Examples of origins of replication for use in a yeast host cell are the 2 
micron origin of replication, ARS1, ARS4, the combination of ARS1 and CEN3, and the 
combination of ARS4 and CEN6. The origin of replication may be one having a mutation which 
makes it functioning temperature-sensitive in the host cell (see, e.g., Ehrlich, 1978, Proceedings of 
the National Academy of Sciences USA 75: 1433). 

More than one copy of a nucleic acid sequence of the present invention may be inserted 
into the host cell to increase production of the gene product. An increase in the copy number of 
the nucleic acid sequence can be obtained by integrating at least one additional copy of the 
sequence into the host cell genome or by including an amplifiable selectable marker gene with the 
nucleic acid sequence where cells containing amplified copies of the selectable marker gene, and 
thereby additional copies of the nucleic acid sequence, can be selected for by cultivating the cells 
in the presence of the appropriate selectable agent. 
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The procedures used to ligate the elements described above to construct the recombinant 
expression vectors of the present invention are well known to one skilled in the art (see, e.g., 
Sambrook etal., 19B9, supra). 

The protease may also be co-expressed together with at least one other enzyme of 
interest for animal feed, such as phytase (EC 3.1.3.8 or 3.1.3.26); xylanase (EC 3.2.1.8); 
galactanase (EC 3.2.1.89); alpha-galactosidase (EC 3.2.1.22); protease (EC 3.4.-.-), 
phospholipase A1 (EC 3.1.1.32); phospholipase A2 (EC 3.1.1.4); lysophospholipase (EC 3.1.1.5); 
phospholipase C (3.1.4.3); phospholipase D (EC 3.1.4.4); and/or beta-glucanase (EC 3.2.1.4 or 
EC 3.2.1.6). 

The enzymes may be co-expressed from different vectors, from one vector, or using a 
mixture of both techniques. When using different vectors, the vectors may have different 
selectable markers, and different origins of replication. When using only one vector, the genes can 
be expressed from one or more promoters. If cloned under the regulation of one promoter (di- or 
multi-cistronic), the order in which the genes are cloned may affect the expression levels of the 
proteins. The protease may also be expressed as a fusion protein, i.e. that the gene encoding the 
protease has been fused in frame to the gene encoding another protein. This, protein may be 

• another enzyme or a functional domain from another enzyme. • • ; .. -.» -.•. 

Accordingly, the invention*- also relates to a recombinant expression- vector • or 

polynucleotide construct comprising a polynucleotide of the invention. 

Host Cells 

The present invention also relates to recombinant host cells, comprising a nucleic acid 
sequence of the invention, which are advantageously used in the recombinant production of the 
polypeptides. A vector comprising a nucleic acid sequence of the present invention is introduced 
into a host cell so that the vector is maintained as a chromosomal integrant or as a self-replicating 
extra-chromosomal vector as described earlier. The term "host cell" encompasses any progeny of 
a parent cell that is not identical to the parent cell due to mutations that occur during replication. 
The choice of a host cell will to a large extent depend upon the gene encoding the polypeptide 
and its source. The host cell may be a unicellular microorganism, e.g., a prokaryote, or a non- 
unicellular microorganism, e.g., a eukaryote. 

Useful unicellular cells are bacterial cells such as gram positive bacteria including, but not 
limited to, a Bacillus cell, or a Streptomyces cell, or cells of lactic acid bacteria; or gram negative 
bacteria such as E. co//and Pseudomonas sp. Lactic acid bacteria include, but are not limited to. 
species of the genera Lactococcus, Lactobacillus, Leuconostoc, Streptococcus, Pediococcus, and 
Enterococcus. Useful unicellular cells are bacterial cells such as gram positive bacteria including, 
but not limited to. a Bacillus cell, e.g.. Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus 
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brevis, Bacillus circulans. Bacillus clausii, Bacillus coagulans, Bacillus lautus, Bacillus lentus. 
Bacillus licheniformis, Bacillus megaterium, Bacillus steamthermophilus. Bacillus subtilis, and 
Bacillus thuringiensis; or a Streptomyces cell, e.g., Streptomyces lividans or Streptomyces 
murinus, or gram negative bacteria such as £. coll and Pseudomonas sp. In a preferred 
embodiment, the bacterial host cell is a Bacillus lentus, Bacillus licheniformis, Bacillus 
stearothermophilus or Bacillus subtilis cell. In another preferred embodiment, the Bacillus cell is 
an alkalophilic Bacillus. 

The introduction of a vector into a bacterial host cell may. for instance, be effected by 
protoplast transformation (see. e.g., Chang and Cohen. 1979, Molecular General Genetics 168: 
111-115), using competent cells (see, e.g., Young and Spizizin, 1961, Journal of Bacteriology 81: 
823-829, or Dubnau and Davidoff-Abelson, 1971. Journal of Molecular Biology 56: 209-221). 
electroporation (see, e.g., Shigekawa and Dower, 1988, Biotechniques 6: 742-751), or conjugation 
(see, e.g., Koehler and Thome, 1987, Journal of Bacteriology 169: 5771-5278). The host cell may 
be a eukaryote, such as a non-human animal cell, an insect cell, a plant cell, or a fungal cell. In 
one particular embodiment, the host cell is a fungal cell. "Fungi" as used herein includes the phyla 
Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota (as defined by Hawksworth et al., 
In, Ainsworth and Bisby's Dictionary of The Fungi, 8th edition, 1995, CAB International, University 
Press, Cambridge, UK) as well as the Oomycota (as cited In Hawksworth et al., 1995; supra, page 
171) and all mitosporic fungi (Hawksworth et al., 1995, supra). 

In another particular embodiment, the fungal host cell is a yeast cell. "Yeast" as used 
herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast 
belonging to the Fungi Imperfect! (Blastomycetes). Since the classification of yeast may change in 
the future, for the purposes of this invention, yeast shall be defined as described in Biology and 
Activities of Yeast (Skinner, FA, Passrnore, S.M., and Davenport, R.R., eds, Soc. App. Bacteriol. 
Symposium Series No. 9, 1980). 

The yeast host cell may be a CanoVda. Hansenula, Kluyveromyces, Pichia, 
Saccharomyces, Schizosacchammyces, or Yarrowia cell. 

The fungal host cell may be a filamentous fungal cell. "Filamentous fungi" include all 
filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth et al., 
1995, supra). The filamentous fungi are characterized by a mycelial wall composed of chitin] 
cellulose, glucan, chitosan, mannan, and other complex polysaccharides. Vegetative growth is by 
hyphal elongation and carbon catabolism is obligately aerobic In contrast, vegetative growth by 
yeasts such as Saccharomyces cerevisiae is by budding of a unicellular thallus and carbon 
catabolism may be fermentative. 
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Examples of filamentous fungal host cells are cells of species of, but not limited to. 
Acremonium, Aspergillus, Fusarium. Humicola. Mucor, Mycellophthora, Neurospora, Penicillium, 
Thielavia, Toiypocladium, or Trichoderma. 

Fungal cells may be transformed by a process Involving protoplast formation, 
5 transformation of the protoplasts, and regeneration of the cell wall in a manner known per se. 
Suitable procedures for transformation of Aspergillus host cells are described in EP 238 023 and 
Yelton ef a/., 1984, Proceedings of the NaVonal Academy of Sciences USA 81: 1470-1474. 
Suitable methods for transforming Fusarium species are described by Malardier era/., 1989, Gene 
78: 147-156 and WO 96/00787. Yeast may be transformed using the procedures described by 
10 Becker and Guarente, In Abelson, J.N. and Simon. M.L, editors. Guide to Yeast Genetics and 
Molecular Biology, Methods in Enzymology, Volume 194. pp 182-187. Academic Press, Inc., New 
York; Ito et a/., 1983, Journal of Bacteriology 163: 163; and Hinnen ef a/.. 1978, Proceedings of 
the National Academy of Sciences USA 75: 1 920. 

The invention relates to a recombinant host cell comprising a polynucleotide of the 
invention, or an expression vector or polynucleotide construct of the invention. In a preferred 
embodiment, the recombinant host cell is a Bacillus cell. * • 
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Plants - •• - . 

The present invention also relates to a transgenic plant, plant part, or plant cell which has 
been transformed with a nucleic acid sequence encoding a polypeptide having protease activity of 
the present invention so as to express and produce the polypeptide in recoverable quantities. The 
polypeptide may be recovered from the plant or plant part Alternatively, the plant or plant part 
containing the recombinant polypeptide may be used as such for improving the quality of a food or 
feed, e.g., improving nutritional value, payability, and rheological properties, or to destroy an 
25 antinutritive factor. 

In a particular embodiment, the polypeptide is targeted to the endosperm storage 
vacuoles in seeds. This can be obtained by synthesizing it as a precursor with a suitable signal ' 
peptide, see Horvath et al in PNAS. Feb. 15, 2000. vol. 97, no. 4, p. 1 914-1 91 9. 

The transgenic plant can be dicotyledonous (a dicot) or monocotyledonous (a monocot) 
or engineered variants thereof. Examples of monocot plants are grasses, such as meadow grass 
(blue grass. Poa). forage grass such as festuca. lolium, temperate grass, such as Agrostis, and 
cereals, e.g., wheat, oats, rye, barley, rice, sorghum, and maize (com). Examples of dicot plants 
are tobacco, legumes, such as lupins, potato, sugar beet, pea, bean and soybean, and cruciferous 
plants (family Brassicaceae). such as cauliflower, rape seed, and the closely related model 
organism Arabidopsis thaliana. Low-phytate plants as described e.g. in US patent no. 5.689.054 
and US patent no. 6,1 1 1 . 168 are examples of engineered plants. 
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Examples of plant parts are stem, callus, leaves, root, fruits, seeds, and tubers Also 
specific plant tissues, such as chloroplast, apoplast, mitochondria, vacuole, peroxisomes and 
cytoplasm are considered to be a plant part. Furthermore, any plant cell, whatever the tissue 
origin, is considered to be a plant part. 

Also included within the scope of the present invention are the progeny of such plants 

• plant parts and plant cells. 

The transgenic plant or plant cell expressing a polypeptide of the present invention may 
be constructed in accordance with methods known in the art. Briefly, the plant or plant ceil is 
constructed by incorporating one or more expression constructs encoding a polypeptide of the 
. present .nvention into the plant host genome and propagating the resulting modified plant or plant 
cell into a transgenic plant or plant cell. 

Conveniently, the expression construct is a nucleic acid construct which comprises a 
nucle.c acid sequence encoding a polypeptide of the present invention operably linked with 
appropnate regulatory sequences required for expression of the nucleic acid sequence in the plant 
or plant part of choice. Furthermore, the expression construct may comprise a selectable marker 
useful for identifying host cells into which the expression construct has been integrated and DMA 
sequences necessary for introduction of the construct into the plant in question (the-latter depends 

• on the DNA introduction method to be used). - 

The choice of regulatory sequences, such as promoter and terminator sequences and 
optionally signal or transit sequences are determined, for example, on the basis of when, where 
and how the polypeptide is desired to be expressed. For instance, the expression of the gene 
encoding a polypeptide of the present invention may be constitutive or inducible, or may be 
developmental, stage or tissue specific, and the gene product may be targeted to a specific tissue 
or plant part such as seeds or leaves. Regulatory sequences are, for example, described by 
Tague ef a/., 1988. Plant Physiology 86: 506. 

For constitutive expression, the 35S-CaMV promoter may be used (Franck ef a/. 1980 
Cell 21: 285-294). Organ-specific promoters may be. for example, a promoter from storage sink 
tissues such as seeds, potato tubers, and fruits (Edwards & Coruzzi, 1990, Ann. Rev. Genet 24- 
275-303), or from metabolic sink tissues such as meristems (Ito et a/.. 1994. Plant Mot Biol 24- 
863-878). a seed specific promoter such as the glutelin, prolamin. globulin, or albumin promoter 
from rice (Wu ef a/.. 1998. Plant and Cell Physiology 39: 885-889). a Viae faba promoter from the 
legumm B4 and the unknown seed protein gene from Vicia faba (Conrad et at, 1998 Journal of 
Plant Physiology 152: 708-711), a promoter from a seed oil body protein (Chen ef a/., 1998 Plant 
and Cel. Physiology 39: 935-941), the storage protein napA promoter from Brassica napus or any 
other seed specific promoter known in the art, e.g.. as described in WO 91/14772. Furthermore 
the promoter may be a leaf specific promoter such as the rbcs promoter from rice or tomato 
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(Kyozuka et a/., 1993, Plant Physiology 102: 991-1000, the chlorella virus adenine 
methyltransferase gene promoter (Mitra and Higgins. 1994, Plant Molecular Biology 26- 85-93) or 
the aldP gene promoter from rice (Kagaya et a/., 1995, Molecular and General Genetics 248- 668- 
674), or a wound inducible promoter such as the potato pin2 promoter (Xu et a/., 1993 Plant 
Molecular Biology 22: 573-588). 

A promoter enhancer element may also be used to achieve higher expression of the 
protease in the plant For instance, the promoter enhancer element may be an intron which is 
placed between the promoter and the nucleotide sequence encoding a polypeptide of the present 
.nvention. For instance, Xu et a/., 1993, supra disclose the use of the first intron of the rice actin 1 
. gene to enhance expression. 

Still further, the codon usage may be optimized for the plant species in question to 
improve expression (see Horvath et al referred to above). 

The selectable marker gene and any other parts of the expression construct may be 
chosen from those available in the art. 

The nucleic acid construct is incorporated into the plant genome according to 
conventional techniques known in the art, including Aorobacterft/m-mediated transformation virus- 
mediated transformation, microinjection, particle bombardment, biolistic .transformation and 
electroporation (Gasser et al., 1990, Science 244: 1293; Potrykus, 1990, Biotechnology 8: 535- 
Shimamoto et al., 1989, Nature 338: 274). 

Presently, Agrobacterium tumefaciens-mediated gene transfer is the method of choice for 
generating transgenic dicots (for a review, see Hooykas and Schilperoort, 1992, Plant Molecular 
B,ology 19: 15-38). However it can also be used for transforming monocots, although other 
transformation methods are generally preferred for these plants. Presently, the method of choice 
for generating transgenic monocots is particle bombardment (microscopic gold or tungsten 
particles coated with the transforming DNA) of embryonic call! or developing embryos (Christou 
1992, Plant Journal 2: 275-281; Shimamoto, 1994, Current Opinion Biotechnology 5: 158-162- 
Vas,l et al., 1992, Biotechnology 10: 667-674). An alternative method for transformation of 
monocots is based on protoplast transformation as described by Omirulleh et al., 1993 Plant 
Molecular Biology 21 : 41 5-428. 

Following transformation, the transformants having incorporated therein the expression 
construct are selected and regenerated into whole plants according to methods well-known in the 
art. 

The present invention also relates to methods for producing a polypeptide of the present 
invention comprising (a) cultivating a transgenic plant or a plant cell comprising a nucleic acid 
sequence encoding a polypeptide having protease activity of the present invention under 
conditions conducive for production of the polypeptide; and (b) recovering the polypeptide The 
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invention relates to a transgenic plant, or plant part, comprising a polynucleotide as defined in 
claim 8, or an expression vector or polynucleotide construct of the invention. 

Animals 

The present invention also relates to a transgenic, non-human animal and products or 
elements thereof, examples of which are body fluids such as milk and blood, organs, flesh, and 
animal ceils. Techniques for expressing proteins, e.g. in mammalian cells, are known in the art. 
see e.g. the handbook Protein Expression: A Practical Approach, Higgins and Hames (eds)! 
Oxford University Press (1999), and the three other handbooks in this series relating to Gene 
> Transcription. RNA processing, and Post-translational Processing. Generally speaking, to prepare 
a transgenic animal, selected cells of a selected animal are transformed with a nucleic acid 
sequence encoding a polypeptide having protease activity of the present invention so as to 
express and produce the polypeptide. The polypeptide may be recovered from the animal, e.g. 
from the milk of female animals, or the polypeptide may be expressed to the benefit of the animal 
itself,. e.g. to assist the animal's digestion. Examples of animals are mentioned below in the 
section headed Animal Feed. 

* . To produce a transgenic animal with a view to recovering the protease from the milk of the 
• animal, a gene encoding the protease may be inserted into the fertilized eggs of an -animal in 
question, e.g. by use of a transgene expression vector which comprises a suitable milk protein 
promoter, and the gene encoding the protease. The transgene expression vector is microinjected 
into fertilized eggs, and preferably permanently integrated into the chromosome. Once the egg 
begins to grow and divide, the potential embryo is implanted into a surrogate mother, and animals 
carrying the transgene are identified. The resulting animal can then be multiplied by conventional 
breeding. The polypeptide may be purified from the animal's milk, see e.g. Meade. H.M. et al 
(1999): Expression of recombinant proteins in the milk of transgenic animals. Gene expression 
systems: Using nature for the art of expression. J. M. Fernandez and J. P. Hoeffier (eds.). 
Academic Press. 

In the alternative, in order to produce a transgenic non-human animal that carries in the 
genome of its somatic and/or germ cells a nucleic acid sequence including a heterologous 
transgene construct including a transgene encoding the protease, the transgene may be operably 
linked to a first regulatory sequence for salivary gland specific expression of the protease, as 
disclosed in WO 2000064247. 

The invention relates to a transgenic, non-human animal, or products, or elements thereof, 
comprising a polynucleotide, or an expression vector or polynucleotide construct of the invention. 



The present invention also relates to methods for producing a polypeptide of the present 
invention comprising (a) cultivating a host cell or a transgenic plant or animal under conditions 
conducive for production of the polypeptide in a supernatant: and optionally (b) recovering the 
polypeptide. 

In the production methods of the present invention, the cells are cultivated in a nutrient 
medium suitable for production of the polypeptide using methods known in the art For example 
the cell may be cultivated by shake flask cultivation, small-scale or large-scale fermentation 
(including continuous, batch, fed-batch, or solid state fermentations) in laboratory or industrial 
fermentors performed in a suitable medium and under conditions allowing the polypeptide to be 
. expressed and/or isolated. The cultivation takes place in a suitable nutrient medium comprising 
carbon and nitrogen sources and inorganic salts, using procedures known in the art. Suitable 
med.a are available from commercial suppliers or may be prepared according to published 
compositions (e.g.. in catalogues of the American Type Culture Collection). If the polypeptide is 
secreted into the nutrient medium, the polypeptide can be recovered directly from the medium. If 
the polypeptide is not secreted, it can be recovered from cell lysates. 

The polypeptides may be detected using methods known in the art that are specificfor the 
: polypeptides. These detection methods may include use of specific antibodies, formation of a 
-product; or disappearance of a substrate. For example, a protease- assay may be used to 
determine the activity of the polypeptide as described herein. 

The resulting polypeptide may be recovered by methods known in the art For example 
the polypeptide may be recovered from the nutrient medium by conventional procedures including" 
but not limited to. centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation. 

The polypeptides of the present invention may be purified by a variety of procedures 
known in the art including, but not limited to. chromatography (e.g.. ion exchange, affinity, 
hydrophobic, chromatofocusing, and size exclusion), electrophoretic procedures (e.g.. preparative 
•soelectnc focusing), differential solubility (e.g., ammonium sulfate precipitation). SDS-PAGE or 
extraction (see, e.g., Protein Purification, J,C. Janson and Lars Ryden, editors, VCH Publishers 
New York, 1989). 

Compositions 

In a still further aspect, the present invention relates to compositions comprising a 
polypeptide of the present invention. The polypeptide compositions may be prepared in 
accordance with methods known in the art and may be in the form of a liquid or a dry composition 
For mstance, the polypeptide composition may be in the form of a granulate or a microgranulate 
The polypeptide to be included in the composition may be stabilized in accordance with methods 
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known in the art Examples are given below of preferred uses of the polypeptides or polypeptide 
compositions of the invention. 

Animal Feed 

The present invention is also directed to methods for using the polypeptides of the 
invention in animal feed, as well as to feed compositions and feed additives comprising the 
polypeptides of the invention. The term animal includes all animals, including human beings. 
Examples of animals are non-ruminants, and ruminants, such as cows, sheep and horses. In a 
particular embodiment, the animal is a non-ruminant animal. Non-ruminant animals include mono- 
. gastric animals, e.g. pigs or swine (including, but not limited to, piglets, growing pigs, and sows); 
poultry such as turkeys, ducks and chicken (including but not limited to broiler chicks, layers)! 
young calves; and fish (including but not limited to salmon, trout, tilapia, catfish and carps; and 
crustaceans (including but not limited to shrimps and prawns) 

The term feed or feed composition means any compound, preparation! mixture, or 
composition suitable for, or intended for intake by an'animal. 

In the use according to the invention the protease can be fed to the animal before, after, or 
simultaneously with the dief.The latter is. preferred.. 

In a particular embodiment, the protease, in the form in- which it is added to the feed, or 
when being included in a feed additive, is well-defined. Well-defined means that the protease 
preparation is at least 50% pure as determined by Size-exclusion chromatography (see Example 
12 of WO 01/58275). In other particular embodiments the protease preparation is at least 60, 70. 
80, 85, 88, 90. 92. 94, or at least 95% pure as determined by this method. A well-defined protease^ 
preparation is advantageous. For instance, it is much easier to dose correctly to the feed a 
protease that is essentially free from interfering or contaminating other proteases. The term dose 
correctly refers in particular to the objective of obtaining consistent and constant results, and the 
capability of optimising dosage based upon the desired effect. 

For the use in animal feed, however, the protease need not be that pure; it may eg 
include other enzymes, in which case it could be termed a protease preparation. The protease 
preparation can be (a) added directly to the feed (or used directly in a treatment process of 
vegetable proteins), or (b) it can be used in the production of one or more intermediate 
compositions such as feed additives or premixes that is subsequently added to the feed (or used 
in a treatment process). The degree of purity described above refers to the purity of the original 
protease preparation, whether used according to (a) or (b) above. 

Protease preparations with purities of this order of magnitude are in particular obtainable 
using recombinant methods of production, whereas they are not so easily obtained and also 
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subject to a much higher batch-to-batch variation when the protease is produced by traditional 
fermentation methods. Such protease preparation may of course be mixed with other enzymes 

In a particular embodiment, the protease for use according to the invention is capable of 
solubilising vegetable proteins. A suitable assay for determining solubilised protein is disclosed in 
Example 11. 

The term vegetable proteins as used herein refers to any compound, composition, 
preparation or mixture that includes at least one protein derived from or originating from a 
vegetable, including modified proteins and protein-derivatives. In particular embodiments, the- 
prote.n content of the vegetable proteins is at least 10. 20. 30. 40. 50. or 60% (w/w). Vegetable 
. protems may be derived from vegetable protein sources, such as legumes and cereals, for 
example materials from p.ants of the families Fabaceae (Leguminosae), Cruciferaceae, 
Chenopodiaceae, and Poaceae, such as soy bean meal, lupin meal and rapeseed meal. In a 
particular embodiment the vegetable protein source is material from one or more plants of the 
family Fabaceae, e.g. soybean, lupine, pea. or bean. 

In another particular embodiment, the vegetable" protein source is material from one or 
more plants of the family Chenopodiaceae, e.g. beet, sugar beet, spinach or quinoa. Other 
examples of vegetable protein sources are rapeseed, and cabbage. Soybean is a preferred 
vegetable protein source. Other examples of vegetable- protein- sources are cereals such as 
barley, wheat, rye, oat, maize (corn), rice, and sorghum. 

The treatment according to the invention of vegetable proteins with at least one protease of 
the invention results in an increased solubilisation of vegetable proteins. The following are 
examples of % solubilised protein obtainable using the proteases of the invention in a monogastric 
m vitro model: At least 102%, 103%, 104%. 105%. 106%. or at least 107%. relative to a blank 
The percentage of solubilised protein is determined using the monogastric in vitro model of 
Example 11. The term solubilisation of proteins basically means bringing protein(s) into solution 
Such solubilisation may be due to protease-rnediated release of protein from other components of 
the usually complex natural compositions such as feed. 

In a further particular embodiment, the protease for use according to the invention is 
capable of mcreasing the amount of digestible vegetable proteins. The following are examples of 
% d.gested or digestible protein obtainable using the proteases of the invention in a monogastric 
m vrtro model: At least 104%. 105%, 106%. 107%. 108%. 109%. or at least 110%, relative to a 
blank. The percentage of digested or digestible protein is determined using the in vitro model of 
Example 11. 

The .following are examples of % digested or digestible protein obtainable using the 
proteases of the invention in an aquaculture in vitro model: At least 103%. 104%, 105%. 106%, 
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107%, 108%. 109% or at least 110%, relative to a blank. The percentage of digested or digestible 
protein is determined using the aquaculture in vitro model of Example 12. 

In a still further particular embodiment, the protease for use according to the invention is 
capable of increasing the Degree of Hydrolysis (DH) of vegetable proteins. The following are 
5 examples of Degree of Hydrolysis increase obtainable in a monogastric in vitro model: At least 
102%, 103%, 104%, 105%, 106%, or at least 107%, relative to a blank. The DH is determined 
using the monogastric in vitro model of Example 11. The following are examples of Degree of 
Hydrolysis increase obtainable in an aquaculture in vitro model: At least 102%, 103%, 104%. 
105%, 106%. or at least 107%. relative to a blank. The DH is determined using the aquaculture in 
10 . vitro model of Example 12. 

In a particular embodiment of a (pre-) treatment process of the invention, the protease(s) in 

question is affecting (or acting on, or exerting its solubilising influence on) the vegetable proteins 

or protein sources. To achieve this, the vegetable protein or protein source is typically suspended 

in a solvent, e.g. an aqueous solvent such as water, and the pH and temperature values are 

adjusted paying due regard to the characteristics of the enzyme in question. For example, the 

treatment may take place at a pH-value at which the activity of the actual protease is at least at 

i least 40%, 50%. 60%, 70%,.80% or at least 90%. Likewise, for example, the treatment may take . * 

place at a temperature at which- the activity of the actual protease is at least 40%, 50%, 60%. • 

70%, 80% or at least 90%. The above percentage activity indications are relative to the maximum 

activities. The enzymatic reaction is continued until the desired result is achieved, following which 

it may or may not be stopped by inactivating the enzyme, e.g. by a heat-treatment step. 

In another particular embodiment of a treatment process of the invention, the protease 

action is sustained, meaning e.g. that the protease is added to the vegetable proteins or protein 

sources, but its solubilising influence is so to speak not switched on until later when desired, once 

suitable solubilising conditions are established, or once any enzyme inhibitors are inactivated, or 

whatever other means could have been applied to postpone the action of the enzyme. 

In one embodiment the treatment is a pre-treatment of animal feed or vegetable proteins 
for use in animal feed. 

The term improving the nutritional value of an animal feed means improving the availability 
and/or digestibility of the proteins, thereby leading to increased protein extraction from the diet 
components, higher protein yields, increased protein degradation and/or improved protein 
utilisation. The nutritional value of the feed is therefore increased, and the animal performance 
such as growth rate and/or weight gain and/or feed conversion ratio (i.e. the weight of ingested 
feed relative to weight gain) of the animal is/are improved. 

In a particular embodiment the feed conversion ratio is increased by at least 1%. 2%. 3%, 
4%, 5%. 6%, 7%, 8 %. 9% or at least 10%. In a further particular embodiment the weight gain is 
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increased by at least 2%, 3%, 4%, 5%. 6%. 7*1%, 9%, 10% or at least 11%. These figures are 
relative to control experiments, with no protease addition. 

The feed conversion ratio (FCR) and the weight gain may be calculated as described in 
EEC (1986): Directive de la Commission du 9 avril 1986 fixant la methode de calcul de la valeur 
energetique des aliments composes destines a la volaille. Journal Official des Communautes 
Europeennes, L'130, 53 - 54. 

The protease can be added to the feed in any form, be it as a relatively pure protease or in 
admixture with other components intended for addition to animal feed, i.e. in the form of animal 
feed additives, such as the so-called pre-mixes for animal feed. 

In a further aspect the present invention relates to compositions for use in animal feed, 
such as animal feed, and animal feed additives, e.g. premixes. 

Apart from the protease of the invention, the animal feed additives of the invention contain 
at least one fat-soluble vitamin, and/or at least one water soluble vitamin, and/or at least one trace 
mineral. The feed additive may also contain at least one macro mineral. 

Further, optional, feed-additive ingredients are colouring agents, aroma compounds 
•stabilisers, antimicrobial peptides, .including antifungal polypeptides, and/or at least one other 
renzyme: selected from amongst phytase (EC 3.1.3.8 or 3.1.3.26); xylanase (EC 3218)- 
galactanase (EC 3.2.1.89); aipha-galactosidase* • (EC 3.2:1.22);- protease (EC 34--)' 
phospholipase A1 (EC 3.1.1.32); phospholipase A2 (EC 3.1.1.4); lysophospholipase (EC 3 1 1 5)' 
phospholipase C (3.1.4.3); phospholipase D (EC 3.1.4.4); and/or beta-glucanase (EC 3.2.1 4 or 
EC 3.2.1.6). 

In a particular embodiment these other enzymes are well-defined (as defined above for 
protease preparations). 

Examples of antimicrobial peptides (AMPs) are CAP 18. Leucocin A, Tritrpticin, Protegrin- 
1, Thanatin. Defensin, Lactoferrin. Lactoferricin, and Ovispirin such as Novispirin (Robert Lehrer, 
2000). Plectasins. and Statins, including the compounds and polypeptides disclosed in 
PCT/DK02/00781 and PCT/DK02/00812. as well as variants or fragments of the above that retain 
antimicrobial activity. 

Examples of antifungal polypeptides (AFP's) are the Aspergillus giganteus, and Aspergillus 
n,ger peptides, as well as variants and fragments thereof which retain antifungal activity, as 
disclosed in WO 94/01 459 and WO 02/090384. 

Usally fat- and water-soluble vitamins, as well as trace minerals form part of a so-called 
prem,x intended for addition to the feed, whereas macro minerals are usually separately added to 
the feed. A premix enriched with a protease of the invention, is an example of an animal feed 
additive of the invention. 



In a parbcular embodiment, the animal feed additive of the invention is intended for being 
included (or prescribed as having to be included) in animal diets or feed at levels of 0 01 to 10 Oft- 
more particularly 0.05 to 5.0%; or 0.2 to 1.0% (ft meaning g additive per 100 g feed). This is so in 
particular for premixes. 

The following are non-exclusive lists of examples of these components: 

Examples of fat-soluble vitamins are vitamin A, vitamin D3, vitamin E, and vitamin K, e g 
vitamin K3. 

Examples of water-soluble vitamins are vitamin B12. biotin and choline, vitamin B1, vitamin 
B2, vitamin B6, niacin, folic acid and pantothenate, e.g. Ca-D-panthothenate. 

Examples of trace minerals are manganese, zinc, iron, copper, iodine, selenium, and 

cobalt. 

Examples of macro minerals are calcium, phosphorus and sodium. 
The nutritional requirements of these components (exemplified with poultry and 
P.glets/pigs) are listed in Table A of WO 01/58275. Nutritional requirement means that these 
components should be provided in the diet in the concentrations indicated. 

In the alternative, the animal feed additive of the invention comprises at least one of the 
md.vidual .components specified in Table A of WO 01/58275. At least one means either of one or 
- more of, one. or two, or three, or four and so forth up to all thirteen, or up to all fifteen individual 
components. More specifically, this at least one individual component is included in the additive of 
the invention in such an amount as to provide an in-feed-concentration within the range indicated 
in column four, or column five, or column six of Table A 

The present invention also relates to animal feed compositions. Animal feed compositions 
or diets have a relatively high content of protein. Poultry and pig diets can be characterised as 
ind.cated in Table B of WO 01/58275, columns 2-3. Fish diets can be characterised as indicated in 
column 4 of this Table B. Furthermore such fish diets usually have a crude fat content of 200-310 
g/kg. WO 01/58275 corresponds to US 09/779334 which is hereby incorporated by reference 

An animal feed composition according to the invention has a crude protein content of 50- 
800 g/kg, and furthermore comprises at least one protease as claimed herein. 

Furthermore, or in the alternative (to the crude protein content indicated above), the animal 
feed composition of the invention has a content of metabolisable energy of 1 0-30 MJ/kg- and/or a 
content of calcium of 0.1-200 g/kg; and/or a content of available phosphorus of 0.1-200 g/kg- 
and/or a content of methionine of 0.1-100 g/kg; and/or a content of methionine plus cysteine of 
0.1-150 g/kg; and/or a content of lysine of 0.5-50 g/kg. 

In particular embodiments, the content of metabolisable energy, crude protein, calcium 
phosphorus, methionine, methionine plus cysteine, and/or lysine is within any one of ranges 2 3 
4 or 5 in Table BofWO 01/58275 (R. 2-5). ' ' 
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Crude protein is calculated as nitrogen (N) multiplied by a factor 6.25, i.e. Crude protein 
(g/kg)= N (g/kg) x 6.25. The nitrogen content is determined by the Kjeldahl method (AO.A.C., 
1984, Official Methods of Analysis 14th ed., Association of Official Analytical Chemists, 
Washington DC). 

Metabolisable energy can be calculated on the basis of the NRC publication Nutrient 
requirements inswine, ninth revised edition 1988, subcommittee on swine nutrition, committee on 
animal nutrition, board of agriculture, national research council. National Academy Press, 
Washington, D.C., pp. 2-6, and the European Table of Energy Values for Poultry Feed-stuffs. 
Spelderholt centre for poultry research and extension, 7361 DA Beekbergen, The Netherlands! 
' Grafisch bedrijf Ponsen & looijen bv, Wageningen. ISBN 90-71463-12-5. 

The dietary content of calcium, available phosphorus and amino acids in complete animal 
diets is calculated on the basis of feed tables such as Veevoedertabel 1997, gegevens over 
chemische samenstelling, verteerbaarheid en voederwaarde van voedermiddelen, Central 
Veevoederbureau, Runderweg 6, 821 9 pk Lelystad. ISBN 90-72839-1 3-7. 

In a particular embodiment, the animal feed composition of the invention contains at least 
one vegetable protein or protein source as defined above. 
u- •: In still further particular, embodiments, the animal feed composition of- the invention 
• contains 0-80% maize; and/or 0-80% sorghum; and/or 0-70% wheat; and/or 0-70% Barley; and/or 
0-30% oats; and/or 0-40% soybean meal; and/or 0-1 0% fish meal; and/or 0-20% whey. 

Animal diets can e.g. be manufactured as mash feed (non pelleted) or pelleted feed. 
Typically, the milled feed-stuffs are mixed and sufficient amounts of essential vitamins and 
minerals are added according to the specifications for the species in question. Enzymes can be 
added as solid or liquid enzyme formulations. For example, a solid enzyme formulation is typically 
added before or during the mixing step; and a liquid enzyme preparation is typically added after 
the pelleting step. The enzyme may also be incorporated in a feed additive or premix. 

The final enzyme concentration in the diet is within the range of 0.01-200 mg enzyme 
protein per kg diet, for example in the range of 0.5-25 mg enzyme protein per kg animal diet 

The protease should of course be applied in an effective amount, i.e. in an amount 
adequate for improving solubilisation and/or improving nutritional value of feed. It is at present 
contemplated that the enzyme is administered in one or more of the following amounts (dosage 
ranges): 0.01-200; 0.01-100; 0.5-100; 1-50; 5-100; 10-100; 0.05-50; or 0.10-10 - all these ranges 
being in mg protease enzyme protein per kg feed (ppm). 

For determining mg enzyme protein per kg feed, the protease is purified from the feed 
composition, and the specific activity of the purified protease is determined using a relevant assay 
(see under protease activity, substrates, and assays). The protease activity of the feed 



composition as such is also determined using Ine same assay, and on the basis of these two 
determinations, the dosage in mg enzyme protein per kg feed is calculated. 

The same principles apply for determining mg enzyme protein in feed additives Of course 
.f a sample is available of the protease used for preparing the feed additive or the feed the 
specfic activity is determined from this sample (no need to purify the protease from the feed 
composition or the additive). 

The present invention is further described by the following examples which should not be 
construed as limiting the scope of the invention. 

' Detergent ComposiHnng 

The protease of the invention may be added to and thus become a component of a 
detergent composition. The detergent composition of the invention may for example be formulated 
as a hand or machine laundry detergent composition including a laundry additive composition 
suitable for pre-treatment of stained fabrics and a rinse added fabric softener composition or be 
formulated as a detergent composition for use in general household hard surface cleaning 
•operations, or be formulated for hand or machine dishwashing operations. 
* • - In a specific aspect, the invention provides a detergent additive comprising ihe protease of 
the invention/ The detergent additive as well as" the detergent composition may comprise one or 
more other enzymes such as another protease, such as alkaline proteases from Bacillus, a lipase 
a cutinase, an amylase, a carbohydrase. a cellulase, a pectinase, a mannanase, an.arabinase a 
galactanase. a xylanase, an oxidase, e.g., a laccase, and/or a peroxidase. 

In general the properties of the chosen enzyme(s) should be compatible with the selected 
detergent, (i.e. pH-optimum, compatibility with other enzymatic and non-enzymatic ingredients 
etc.), and the enzyme(s) should be present in effective amounts. 

Suitable lipases include those of bacterial or fungal origin. Chemically modified or protein 
engineered mutants are included. Examples of useful lipases include lipases from Humicola 
(synonym Thermomyces), e.g. from H. lanuginosa (7. lanuginosa*) as described in EP 258068 
and EP 305216 or from H. insolens as described in WO 96/13580, a Pseudomonas lipase e g 
from P. alcaligenes or P. pseudoalcaligenes (EP 218272), P. cepacia (EP 331376), P. stutzeri 
1 ' 372 - 034 >- P- fluoresces, Pseudomonas sp. strain SD 705 (WO 95/06720 and WO 
96/27002), P. wisconsinensis (WO 96/12012), a Bacillus lipase, e.g. from B. subtilis (Dartois et al 
(1993), Biochemica et Biophysica Acta. 1131, 253-360), B. stearothermophilus (JP 64/744992) or 
ft pumilus (WO 91/16422). Other examples are lipase variants such as those described in WO 
92/05249, WO 94/01541, EP 407225, EP 260105, WO 95/35381. WO 96/00292, WO 95/30744 
WO 94/25578, WO 95/14783, WO 95/22615, WO 97/04079 and WO 97/07202. Preferred 
commercially available lipase enzymes include Upolase™and Lipolase Ultra™ (Novozymes MS) 
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Suitable amylases (alpha- and/or beta-) Include those of bacterial or fungal origin. 
Chemically modified or protein engineered mutants are included. Amylases include, for example, 
alpha-amylases obtained from Bacillus, e.g. a special strain of B. licheniformis, described in more 
detail in GB 1,296,839. Examples of useful amylases are the variants described in WO 94/02597, 
WO 94/18314, WO 96/23873, and WO 97/43424, especially the variants with substitutions in one 
or more of the following positions: 15, 23, 105, 106, 124, 128, 133, 154, 156, 181, 188, 190, 197, 
202. 208, 209. 243, 264, 304, 305, 391, 408, and 444. Commercially available amylases are 
Duramyl™, Termamyl™, Fungamyl™ and BAN™ (Novozymes A/S), Rapidase™ and Purastar™ 
(from Genencor International Inc.). 

Suitable cellulases include those of bacterial or fungal origin. Chemically modified or 
protein engineered mutants are included. Suitable cellulases include cellulases from the genera 
Bacillus, Pseudomonas, Humicola, Fusarium, Thielavia. Ammonium, e.g. the fungal cellulases 
produced from Humicola insolens, Myceliophthora thermophila and Fusarium oxysporum 
disclosed in US 4,435,307, US 5.648.263. US 5,691,178, US 5,776,757 and WO 89/09259. 
Especially suitable cellulases are the alkaline or neutral cellulases having colour care benefits. 
Examples of such cellulases are cellulases described in EP 0 495257, EP 531372, WO 96/1.1262; 
WO 96/29397, WO 98/08940. Other examples are cellulase variants such as those described in. 
■ WO 94/07998, EP 0 531 315. US 5,457,046, US 5,686,593, US 5,763,254, WO 95/24471, WO- 
98/12307 and WO 99/01544. Commercially available cellulases include Celluzyme™, and 
Carezyme™ (Novozymes A/S), Clazinase™, and Puradax HA™ (Genencor International Inc.) and 
KAC-500(B)™ (Kao Corporation). 

Suitable peroxidases/oxidases include those of plant, bacterial or fungal origin. Chemically 
modified or protein engineered mutants are included. Examples of useful peroxidases include 
peroxidases from Coprinus, e.g. from C. cinereus, and variants thereof as those described in WO 
93/24618, WO 95/10602, and WO 98/15257. Commercially available peroxidases include 
Guardzyme™ (Novozymes). 

The detergent enzyme(s) may be included in a detergent composition by adding separate 
additives containing one or more enzymes, or by adding a combined additive comprising all of 
these enzymes. A detergent additive of the invention, i.e. a separate additive or a combined 
additive, can be formulated e.g. as a granulate, a liquid, a slurry, etc. Preferred detergent additive 
formulations are granulates, in particular non-dusting granulates, liquids, in particular stabilized 
liquids, or slurries. 

Non-dusting granulates may be produced, e.g., as disclosed in US 4,106,991 and 
4,661.452 and may optionally be coated by methods known in the art Examples of waxy coating 
materials are polyethylene oxide) products (polyethyleneglycol, PEG) with mean molar weights of 
1000 to 20000; ethoxylated nonylphenols having from 16 to 50 ethylene oxide units; ethoxylated 



fatty alcohols in which the alcohol contains from 12 to 20 carbon atoms and In which there are 15 
to 80 ethylene oxide units; fatty alcohols; fatty acids; and mono- and di- and triglycerides of fatty 
aads. Examples of film-forming coating materials suitable for application by fluid bed techniques 
are given in GB 1483591. Liquid enzyme preparations may, for instance, be stabilized by adding a 
polyol such as propylene glycol, a sugar or sugar alcohol, lactic acid or boric acid according to 
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The detergent composition of the invention may be in any convenient form, e.g a bar a 

tablet, a powder, a granule, a paste or a liquid. A liquid detergent may be aqueous, typically 
. containing up to 70 % water and 0-30 % organic solvent, or non-aqueous. 

The detergent composition comprises one or more surfactants, which may be non-ionic 

.ndud.ng semi-polar and/or anionic and/or cationic and/or zwitterionic. The surfactants are 

typically present at a level of from 0.1% to 60% by weight. 

When included therein the detergent will usually contain from about 1% to about 40% of an 

anionic surfactant such as linear alkylbenzenesulfonate, alpha-olefinsulfonate, aikyl sulfate (fatty 

alcohol sulfate), alcohol ethoxysulfate, secondary alkahesulfonate, alpha-sulfo fatty acid methyJ 
* ester, alkyl- or alkenylsuccinic acid or. soap, i .• ■ • 

' •: When included therein thedetergent will usually contain from about 0.2% to-about 40% of 
a nornonic surfactant such as alcohol ethoxylate, nonylphenol ethoxylate, alkylpolyglycoside 
alkyldimethylamineoxide, ethoxylated fatty acid monoethanolamide, fatty acid monoethanolamide 
polyhydroxy alkyl fatty acid amide, or N-acyl M-alkyl derivatives of glucosamine Cglucamides") ' 

The detergent may contain 0-65 % of a detergent builder or complexing agent such as 
zeolite, d.phosphate, triphosphate, phosphonate. carbonate, citrate, nitrilotriacetic acid 
ethytenediaminetetraacetic acid, diethylenetriaminepentaacetic acid, alkyl- or alkenylsuccinic acid 
soluble silicates or layered silicates (e.g. SKS-6 from Hoechst). 

The detergent may comprise one or more polymers. Examples are 
carboxymethylcellulose. polyvinylpyrrolidone), poly (ethylene glycol), polyvinyl alcohol), 
poly(v,ny.pyridine-N.oxide ) , poly(vinylimidazo.e), polycarbonates such as polyacrylates • 
male.c/acrylic acid copolymers and lauryl methacrylate/acrylic acid copolymers. 

The detergent may contain a bleaching system which may comprise a H 2 0 2 source such 
as perborate or percarbonate which may be combined with a peracid-forming bleach activator 
such as tetraacetylethylenediamine or nonanoyloxybenzenesulfonate. Alternatively, the bleaching 
system may comprise peroxyacids of e.g. the amide, imide, or sulfone type. 

The enzyme(s) of the detergent composition of the invention may be' stabilized using 

s^aZltr" 9 ^ 3 PO,y °' SUCh 38 Pr0p ^ ne « ^vcerol. a sugar or 
sugar alcohol. lactic add, boric acid, or a boric acid derivative, e.g.. an aromatic borate ester, or a 
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phenyl boronic acid derivative such as 4-formylphenyl boronic acid, and the composition may be 
formulated as described in e.g. WO 92/19709 and WO 92/19708. 

The detergent may also contain other conventional detergent ingredients such as e.g. 
fabric conditioners including clays, foam boosters, suds suppressors, anti-corrosion agents, soil- 
suspending agents, anti-soil redeposition agents, dyes, bactericides, optical brighteners, 
hydrotropes. tarnish inhibitors, or perfumes. 

It is at present contemplated that in the detergent compositions any enzyme, in particular 
the enzyme of the invention, may be added in an amount corresponding to 0.01-100 mg of 
enzyme protein per liter of wash liqour, preferably 0.05-5 mg of enzyme protein per liter of wash 
liqour, in particular 0.1-1 mg of enzyme protein per liter of wash liqour. 

The enzyme of the invention may additionally be incorporated in the detergent formulations 
disclosed in WO 97/07202. 

The invention described and claimed herein is not to be limited in scope by the specific 
embodiments herein disclosed, since these embodiments are intended as illustrations of several 
aspects ofthe invention. Any equivalent embodiments are intended to be within the scope of this 
.invention.- Indeed, various modifications ofthe invention in addition to those shown and described 
•herein will become apparent to those skilled in the art from the foregoing description: Such 
•modifications are also intended to fall within the scope of the appended claims. In the case of 
conflict, the present disclosure including definitions will control. 

Various references are cited herein, the disclosures of which are incorporated by reference 
in their entireties. 

EXAMPLES 
Materials and methods 

Strains: 

Bacillus subtilis PL1801 (Diderichsen. B et al. 1990. Cloning of aldB, which encodes 
alpha-acetolactate decarboxylase, an exoenzyme from Bacillus brevis. J. Bacterid., 
172, 4315-4321) 
Bacillus subtilis MB1053 
Bacillus subtilis PL3598-37 
Bacillus subtilis MB1 51 0 

Bacillus subtilis PL2306. This strain is the B.subtilis DN1885 with disrupted apr and npr 
genes (Diderichsen. B., Wedsted, U., Hedegaard, L, Jensen, B. R., Sjoholm, C. (1990) 
Cloning of aldB, which encodes alpha-acetolactate decarboxylase, an exoenzyme from 
Bacillus brevis. J. Bacterid.. 172, 4315-4321) which is also disrupted in the 
transcriptional unit of the known Bacillus subtilis cellulase gene, resulting in cellulase 
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negative cells. The disruption was performed essentially as described in (Eds. A.L. 
Sonenshein, J A Hoch and Richard Losick (1993) Bacillus subtilis and other Gram- 
Positive Bacteria, American Society for microbiology, p.618). 



Procedure for isolating qenomic DNA. 

Harvest 1.5 ml culture and resuspend in 100 pi TEL. Leave at 37C for 30 min. 
Add 500 pi thiocynate buffer and leave at room temperature for 10 min. 
Add 250 pi NH4Ac and leave at ice for 10 min. 
Add 500 pi CIA and mix. 
; Transfer to a microcentrifuge and spin for 10 min. at full speed. 
Transfer supernatant to a new Eppendorf tube and add 0.54 volume cold isopropanol. Mix 
thoroughly. 

Spin and wash the DNA pellet with 70 % EtOH. 
Resuspend the genomic DNA in 100 pi TER. 

TE: 10mMTris-HCI, pH7.4 • 



Purification of PCR bands and DNA sequencing 

PCR fragment can be purified using GFX™ PCR DNA and Gel Band™ Purification Kit 
(Pharmacia Biotech) according to the manufacturer's instructions. The nucleotide sequences of 
the amplified PCR fragments are determined on an ABI PRISM™ 3700 DNA Analyzer (Perkin 
Elmer, USA) using 50-100 ng as template, the Taq deoxy-terminal cycle sequencing kit (Perkin 
Elmer , USA), fluorescent labeled terminators and 5 pmol of the sequencing primer of choice. 



-TEL: 
Thiocyanate: 



TER: 



CIA: 



NH4Ac: 



1 mM EDTA, pH-8.0. • • ■ 

50 mg/ml Lysozym in TE-buffer - «* * 

5M guanidium thiocyanate . 

100 mM EDTA 

0.6 % w/v N-laurylsarcosine, sodium salt. 

60 g thiocyanate, 20 ml 0.5 M EDTA, pH 8.0, 20 ml H20 

dissolves at 65C. Cool down to RT and add 0.6 g N- 

laurylsarcosine. Add H20 to 100 ml and filter it through a 0.2 p 

sterile filter. 

7.5 M CH3COONH4 

1 pg/ml Rnase A in TE-buffer 

Chloroform/isoamyl alcohol 24:1 



Media 



TY: (As described in Ausubel, F. M. et al. (eds.) "Current protocols in Molecular Biology". John 
Wiley and Sons, 1995). 

LB agar: (As described in Ausubel, F. M. et al. (eds.) "Current protocols in Molecular Biology- 
John Wiley and Sons, 1995). 

LB-PG agar is LB agar supplemented with 0.5% Glucose and 0.05 M potassium phosphate, pH 
7.0, 

Proteolytic activity 

S2A protease activity is measured using the PNA assay with succinyl-alanine-alanine- 
proline-phenylalnine-paranitroanil.de as a substrate unless otherwise mention. The principle of the 
PNA assay is described in Rothgeb, T.M., Goodlander, B.D., Garrison, P.H., and Smith, L.A., 
Journal of the American Oil Chemists' Society, Vol. 65 (5) pp. 806-810 (1988). 

Gene expr ession in Bacillus subWjs hn»t 

All the expressed genes in the following examples are integrated by homologous 
recombination on the Bacillus subtilis host cell genome. The genes are expressed .under the 
control of a triple promoter system (as described in WO 99/43835). consisting of the promoters 
from Bacillus licheniformis alpha-amylase gene (amyL). Bacillus amyloliquefaciens afpha-amylase 
gene (amyQ), and the Bacillus thuringiensis crylllA promoter including stabilizing sequence. The 
gene coding for Chloramphenicol acetyl-transferase was used as maker (Described in eg 
Diderichsen.B.; Poulsen.G.B.; Joergensen.S.T.; A useful cloning vector for Bacillus subtilis 
Plasmid 30:31 2 (1 993)). 

Example 1. Construction of synthetic 10R tail-variant genes with Savinase signal 

A synthetic 10R gene (10RS) encoding a S2A protease denoted 10R from Nocartiopsis 
sp. NRRL 18262 (WO 01/58276) was constructed which has the nucleotide sequence shown in 
SEQ ID NO: 1. This synthetic gene was fused by PCR in frame to the DNA coding for the signal 
pept,de from SAVINASE™ (Novozymes) resulting in the coding sequence Sav-10RS which is 
shown in SEQ ID NO: 2. Several tail-variants of this construct were made. Compared to the Sav- 
10RS protease encoded by SEQ ID NO:2 the tail variant construct Sav-10RS HVO was 
constructed to have 8 amino acids extra in the C-terminus: QSHVQSAP (SEQ ID NO: 3) which 
were encoded by the following DNA sequence extension inserted in front of the TAA stopcodon of 
SEQ ID NO: 2: 

(SEQ ID NO: 4): caatcgcatgttcaatccgctcca 
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Tail variant Sav-10RS HV1 was constructed to have 4 amino acids extra in the C- 
terminus: QSAP (SEQ ID NO: 5), with the following DMA sequence extension inserted in front of 
the TAA stopcodon: 

(SEQ ID NO: 6): caatcggctcct 

Tail variant Sav-10RS HV3 was constructed to have 2 amino acids extra in the C- 
terminus: QP (SEQ ID NO: 7) with the following DMA sequence extension inserted in front of the 
TAA stopcodon: 

(SEQ ID NO: 8): caacca 

* 

Tail variant Sav-10RS HV2 was constructed to have one amino acid extra in the C- 
terminus: P (SEQ ID NO: 9) with the following DNA sequence extension inserted in front of the 
TAA stopcodon: 

(SEQ ID NO: 10): cca 

The 1 0RS gene and the four tail-variant encoding genes were integrated by homologous 
recombination into the Bacillus subtilis MB1053 host, cell genome. Chloramphenicol resistant 
■ transformants were -checked for protease activity -on 1% skim milk LB-PG agar plates 
(supplemented with 6 ug/ml chioramphenicoD. Some protease positive colonies were further 
analyzed by DNA sequencing of the insert to ensure the correct gene DNA sequence, and five 
strains, each comprising one of the above constructs, were selected and denoted, respectively: 
B.subtiHs Sav-10RS, B.subtUis Sav-10RS HVO, B.subtilis Sav-10RS HV1, B.subtilis Sav-10RS 
HV2 and B.subtilis Sav-10RS HV3. 

Example 2. Fermentation yields of 10R tail-variants with Savinase signal 

Fermentations for the production of the tail-variant enzymes of the invention were 
performed on a rotary shaking table in 500 ml baffled Erienmeyer flasks each containing 100 ml 
TY supplemented with 6 mg/l chloramphenicol. 

Six Erienmeyer flasks for each of the five B. subtilis strains from example 1 were 
fermented in parallel. Two of the six Erienmeyer flasks were incubated at 37 e C (250 rpm), two at 
30-C (250 rpm), and the last two at 26»C (250 rpm). A sample was taken from each shake flask at 
day 1, 2 and 3 and analyzed for proteolytic activity. The results are shown in tables 1-3. As rt can 
be seen from tables 1 -3, the effect of the tails is a surprisingly high improvement on the 
expression level of the protease, as measured by activity in the culture broth. The effect is most 
pronounced at 26*C and 30'C. but is also evident at 37'C as an effect observed especially at the 
early stage of the fermentation. 
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Table 1: Relative proteolytic activities at 37*0. 





Day 1 


Day 2 


Dav 3 


Sav-10RS 


- 1.0 


1.0 


1 0 


Sav-10RS HVO 


3.3 


0.7 


0,8 


Sav-10RS HV1 


4.7 


1.3 


1.2 


Sav-10RS HV2 . 


2,2 • 


. 0.6 


0.4 


Sav-10RS HV3 


5,3 


1.4 


1.7 


Table 2: Relative proteolytic activities at 30 Q C. 






Day 1 


Oay 2 


Day 3 


Sav-IORS 




1.0 


1.0 


Sav-10RS HVO 


V 


2.2 


2.9 


Sav-10RS HV1 


4.6 


3.1 


4.9 


Sav~10RSHV2 


2.4 


1.9 


2.3 


Sav-10RS HV3 


4.8 


3.0 


4,4 


Table 3: Relative proteolytic activities at 26*C. 






Day 1 


Day 2 


Day 3 


Sav-10RS 




1.0 


1.0 


Sav-10RS HVO 


1.8 


2.5 


3.1 


SAV-10RSHV1 


2,5 


3.6 


•4,3 


Sav-10RS HV2 


-..1,8 _ 


: 2,6- 


• 2.8 - 


Sav-10RS HV3 


2,6 


3.5 


4.6 



Example 3. Chromosomal integration of tail-variant genes 

The following construct was used for the chromosomal integration of the tail-variant 
encoding genes. The coding sequence of the well-known subtilisin BPN* protease was 
operationally linked to a triple promoter, a marker gene was fused to this (a spectinomycin 
resistance gene surrounded by resolvase res-sites), and pectate lyase encoding genes from 
Bartlus subttfis were fused to the construct as flanking segments comprising the 5' polynucleotide 
reg.on upstream tyfmD-ytmC-yfmB-yfmA-Pel-start], and the 3' polynucleotide region downstream 
(Pel-end-yflS-citS(start)] of the tail-variant encoding polynucleotide, respectively. The integrational 
cassette was made by the joining of several different PCR fragments. After the final PCR reaction 
the PCR product was used for transformation of naturally competent B. subtilis cells. One clone 
denoted PL3598-37 was selected and confirmed by sequencing to contain the correct construct. 



The PL3598-37 clone thus contains the following: 



10495.020-OK 



10 



15 



1 . The flanking regions 100% homologous toregion of the B.subtilis genome (appears as the 
upstream fragment yfrnD-ytmC-yfrnB-yfrnA-Pelstart and the downstream fragment Pel-end- 
yflS-citS(start)). 

2. The Spectinomycin resistance gene flanked by Resolvase sites (res). 

3. The triple promoter region plus CrylllA mRNA stabilising leader sequence. 

4. The BPN* Open Reading Frame. 

Construction Of triple nrnmoter BPN' rastoHa 

A PCR fragment comprising the integrational cassette for a BPN' library was 
constructed, thus operably linking a triple promoter (as described in WO 99/43835; Novozymes) to 
a BPN' expression cassette from a Bacillus strain. The triple promoter is a fusion of an optimized 
Bacillus amyL-derived promoter (as shown in WO 93/10249; Novozymes) with two promoters 
scBAN and crylllA, where the first is a consensus version of the Bacillus amyloliquefaciens 
amylase BAN promoter, and the latter includes a mRNA-stabilising sequence (as described in WO 
99/43835; Novozymes). Suitable primers can be derived from the publicly available sequences • 
(Vasantha, N. et al. Genes for alkaline protease and neutral protease from Bacillus < 
. amyloliquefaciens contain a large open reading frame between the.regions coding for signal . 
sequence and mature protein. -j. Bacterid. 159:811 (1984) EMBL: accession No. K02496). AKpnl i 
and a Sail restriction site was introduced to flank the PCR fragment at each end, using the 
20 primers: 

#252639 (SEQ ID NO: 11): catgtgcatgtgggtaccgcaacgttcgcagatgctgctgaagag 
#251992 (SEQ ID NO: 12): catgtgcatgtggtcgaccgattatggagcggattgaacatgcg 

25 The Kpnl and Sail restriction sites in the PCR fragment were subsequently used to clone 

the fragment into a Kpnl-Sall digested Peel-Spec PCR fragment. The. Peel-Spec fragment 
comprises a Spectinomycin resistance gene inserted in the middle of the B.subtilis Pectate lyase 
gene plus approx. 2.3 kb of upstream genomic DNA and approx. 1.7 kb downstream genomic 
DMA The Peel-Spec fragment was produced by PCR amplification of genomic DNA from the 

30 B.suof/7/s strain MB1053, using the primers: 



#179541 (SEQ ID NO: 13): gcgttgagacgcgcggccgcgagcgccgtttggctgaatgatai 
#179542 (SEQ ID NO: 14): gcgttgagacagctcgagcagggaaaaatggaaccgcttttt. 
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Construction of MB1053 
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The MB10S3 B.subtilis strain was constructed by deletion of the pectatelyase (Pel) gene 
through integration of a PCR product into a wild-type B.subtilis typestrain genome. This was 
ach.eved by a PCR amplification of genomic DNA directly downstream and upstream of the 
Pectate lyase gene of the B.subtilis. 

The ends of the genomic DNA directly preceding and proceeding the Pel gene were 
elongated through primer insertion of sequences being 100% homologous to DNA sequences 
defined by the ends of a third PCR fragment encoding a marker gene surrounded by Resolvase 
(Res) sites. In this particular case the marker gene (Spec) conferred resistance to spectinomycih 
and it was situated between two Res sites, altogether present on the plasmid pSJ3358 (described 
. In US patent No. 5.882,888). Three different PCR fragments were initially produced. 

Fragment 1: this fragment covers from the yfmD gene to the middle of the Pel gene and 
introduces an overhang to the Res-Spec-Res cassette at the Pel gene. The size of fragment 1 is 
2.8 kb. The fragment was produced by a PCR amplification chromosomal DNA from the B subWs 
strain PL2306, using the primers: 

; #179541 (SEC ID NO: -13). and 
#179539 with overlap to #179154 Spec primer (SEQ ID NO: 15): 



Fragment 2: this fragment covers from the middle of the Pel gene to after the end of the CitS gene 
and mtroducing an overhang to the Res-Species cassette at the middle of the Pel gene The 
sue of fragment 2 is 2.3 kb. The fragment was produced by a PCR ampliation of chromosomal 
DNA from the B.subtilis strain PL2306, using the primers: 

#179542 (SEQ ID NO: 14), and 

#179540 with overlap to #179153 Spec primer (SEQ ID NO: 16): 



Fragment 3: this fragment contains the Spectinomycin gene surrounded by Res sites and DNA 
sequences .n the ends overlapping with PCR fragment 1 and 2. The size of fragment 3 is 1.6 kb 
Fragment 3 was produced by PCR amplification of plasmid pSJ3358, using the primers: 

#179154 (SEQ ID NO: 17): gttgtaaaacgacggccagtgaattctgatcaaatgg 
#179153 (SEQ ID NO: 18): ccgcgtcgacactagacacgggtacctgatctagatc 
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Standard c onditions for the PGR reaction 

For the PCR amplifications of fragment 1-3 the HiFi Expand™ PCR system (Roche) was 
used together with the following cycling scheme: 

5 ul Buffer 2 

5 MuldNTP'sd^SmMeach) 
2.5 ud 20 mM primer 1 
2.5 ul 20uM primer 2 
x pi water 

To this mix 3 pi of DNA (apx. 100 ng) and 0.75 pi Enzyme mix (use hot start) is added. 
10 Total volume is 50 \il 
The cycling profile is: 

1 cycle of 120 sec at 94'C 



Break. 



15 



10 cycles of 1 5 sec at 94'C. 60 sec at 60'C, 240 sec at 72'C. 

20 cycles of 15 sec at 94'C. 60 sec at 60*C, (180 sec at 72*C add 20 sec pr cycle> 

1 cycle 600 sec at 68'C. 



" The three PCR fragments were made and joined in later JOINING-PCR reactions. The three PCR 
fragments were single sharp bands and no gel purification was necessary. Only Qiagen™ PCR 
20 purification was performed prior to the following JOINING-PCR. 
JOINING of fragment 1 + 3 (same procedure for fragment 2 + 3): 
5 pi Buffer 2 

8 Ml dNTP's (1 .25 mM each) 
5.0 Ml Fragment 3 
25 5.0 Ml Fragment 1 

9.25 Ml water 

1 cycle of 120 sec at 94'C. 
Break. Add Enzyme 

30 10 Wfes of 15 sec at 94*C, 60 sec at 60'C, 240 sec at 72*C. 

Break. Add Primers 

1 5 cycles of 15 sec at 94*C, 60 sec at 60'C,(1 80 sec at 72*C add 20 sec pr cycle) 
1 cycle 600 sec at 68'C. 

35 After the first cycle at 94'C for 1 20 sec there is a break, where 0.75 M l Enzyme mix is added. 
Total volume is now 45.0 pi. 
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After the initial 10 cycles, there is another break in the cycling and for fragment 1+3: 2.5 pi (20pM 
#179541) and 2.5 pi (20 pM #179153) are added and for fragment 2+3: 2.5 M l (20pM #179542) 
and 2.5 pi (20 pM #179154) are added and the cycling is continued for 15 cycles more. 

5 The PCR products were then gel purified: The size of fragment 1+3 should be 3 4 kb 

and the size of fragment 2+3 should be 3.4 kb. These two fragments were joined in a last PCR 
reaction (Expand™ long system, Roche): 

5 pi Buffer 1 
10 ' 14pldNTPs(1.25mMeach) 
5.0 pi Fragment 1+3 
5.0 pi Fragment 2+3 
17.75 pi water 

15 After the first cycle at 94'C for 120 sec there is a break, where 0.75 pi Enzyme mix is added. 
Total volume is now 45.0 pi. 

After the initial 1 0 cycles, there is another break in the cycling and 2.5 pi (20pM.#1 79541) and 2 5 
pi (20 pM#179542) is added and the cycling is" continued for 15 cycles more. . . • » 

1 cycle of 120 sec at 94*C. 
20 Break. Add Enzyme 

10 cycles of 15 sec at 94*C, 60 sec at 60*C, 240 sec at 68'C. 
Break. Add Primers 

15 cycles of 15 sec at 94'C, 60 sec at 60'C, 180 sec at 68'C add 20 sec pr cycle 
1 cycle 600 sec at 68'C. 
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The size of the joined PCR fragment is 6.8 kb. This PCR fragment was purified using a 
Qiagen™ PCR purification kit, and 5 pi of the 50 pi eluted DNA was used to transform a standard 
B.subtilis strain. After transformation cells were spread onto LBPG-120pg/ml of spectinomycin. 
Next day more than 1000 colonies were seen. 8 of these were checked using PCR primers from 
last JOINING PCR amplification yielding PCR fragment of 6.8 kb rather than the 5.2 kb expected if 
deletion had not occurred. Furthermore, the pectatelyase activity of the clones was checked with 
the Mancini Immunoassay, which showed no reactivity towards the pectatelyase activity. This 
taken together with the Spec resistance tells us that deletion had occurred. One such clone was 
selected and denoted MB1053. 

Insertion of BPN' expression cassette artj ac gnt to the res-soeo-res in MBinS3 
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0 
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The ligation mix of the digested PCR amplified triple promoter BPN' expression cassette 
and the Kpnl-Sal digested Peel-Spec PCR fragment was used as template in a PCR amplification 
using the PCR primers #179541 and #179542. This resulted in a PCR fragment of approx. 9 kb 
wh,ch was used to transform B.subWs PL1801 (Diderichsen, B et al. 1990. Cloning of aldB which 
encodes alpha-acetolactate decarboxylase, an exoenzyme from Bacillus brevis. J. Bacteriol 172 
4315-4321) competent cells. The transformed cells were plated on LB-120 ug/ml Spectinomycin 
agar plates with skim milk. Spectinomycin resistant colonies with large skim milk clearing zones 
were restreaked on Spectinomycin agar plates and analysed for the integration of the PCR 
fragment with PCR using the primers #179541 (SEQ ID NO: 13) and #179542 (SEQ ID NO- 14) 
• Appearance of a 9 kb fragment indicates that the PCR fragment has been integrated 

mto the host cell genome. Several of these clones were sequenced to confirm integration of the 
expression cassette, one such clone was selected and denoted PL3598-37. 

Example 4. Construction of plasmid-borne chromosomal integratfonal cassette 

• An Ecoli plasmid-borne integration^ cassette for a library may be constructed In vivo. 
An mtegration cassette to be used according to the method of the invention may be present on a 
E.col, plasmid .(which is capable only of replication in E.coli. not in B.subtilis), the plasmid 
•comprising: - 

* > • 

i) The DMA sequence encoding the Pre-Pro-domains of the subtilisin protease 
commonly known as Savinase, preceded by and operably linked to 

ii) a DNA sequence comprising a mRNA stabilising segment derived in this particular 
case from the Cryllla gene; 

iii) a marker gene (a chloramphenicol resistance gene), and 

iv) genomic DNA from Bacillus subtilis as 5' and 3' flanking segments: The homologous 
5 polynucleotide region upstream of the polynucleotide [yfmD-ytmC-yfmB-yfmA-Pel-start]. and 
the 3 polynucleotide region downstream of the polynucleotide [Pel-end-yflS-citS(start)] 
respectively. N 

The cassette was made by several cloning steps involving digestion of pUC19 plasmid 
and PCR fragments with appropriate restriction endonuclease sites of several different PCR 
fragments in the generally used plasmid pUC19. After each ligation of a PCR fragment into a 
plasnmd, the Ugation mixture was transformed into electrocompetent DHSalpha E.coli cells that 

rrjir^ tranSf0rmed b * ^operation using a Gene Pulser™ electroporator from 

BIO-RAD as described by the supplier. One final plasmid construct was confirmed by sequencing 
to contam the correct construct as outlined above, and it was denoted P MB1508 



The pMB1508 plasmid thus contains the following: 



so 

0 The CrylllA mRNA stabilising leader sequence including a ribosome binding sequence 
(RBS), operationally linked to 

ii) DMA encoding the Pre-Prc-domains of the subtilisin commonly known as Savinase 
including Kpnl and Notl sites for cloning; 

iil) The chloramphenicol resistance operon; 

iv) The 3' downstream flanking region [Pel-end-yflS-citS(start)] which is 99-100% 
homologous to the region of the B.subtilis. 

The four elements listed were cloned in the pUC1 9 vector (Isolated from £co// ATCC 
. 37254; Vieira J, Messing J. The P UC plasmids, an M1 3mp7-derived system for insertion 
mutagenes,s and sequencing with synthetic universal primers. Gene 19: 259-268, 1982) in the 
EcoR. and Sail sites to give P MB1508. In order for the resulting plasmid to integrate effeciently to 
a specified site of th B.subWis genome, a new strain was established. The new strain is a 
derivative of Bacillus subtilis 168 BGSC accession number 1A1 168 trpC2 . The strain was made 
competent and transformed as described above. Using elements from the PL3598-37 done 
descnbed above, the new integration strain denoted MB1510 was established and characterised 
to contain the following elements from PL3598-37- 

I) The triple promoter and the mRNA stabilising element. . . 

■U) Flanking segments comprising the following homologous polynucleotide region [yfmD. 
ytmC-yfmB-yfmA-Pel-start] upstream of the triple-promoter, and the polynucleotide region [Pel- 
end-yflS-citS(start)J downstream of the mRNA stabilizing element 

Thus, when using MB1510 competent cells, it is possible for the pMB1508 (or 
derivatives thereof) to directly integrate into the genome of MB1510 where the two flanking 
reg,ons in fusion with the triple-promoter and mRNA stabilising element is located, resulting in a 
construct™ where the incoming PrePro encoding DNA of pMB1508DNA has been integrated in 
the correct reading frame with the tripel-promoter, the mRNA stabilising element and the RBS 
Thus resulting in high expression of the integrated gene from the promoter elements already 
present on the genome of MB1 51 0. 

Transformation effeciency was established for the B.subtilis strain MB1510 transformed 
with B.OOU prepared plasmid pMB1508. For further testing of the potential of using this approach 
the Savinase encoding gene of Bacillus dausii was PCR amplified using the two PCR primers- 



Primer #317 (SEQ ID NO: 19) tggcgcaatcggtaccatgggg 

Primer#139 Notl (SEQ ID NO: 20) catgtgcatgcggccgcattaacgcgttgccgcttctgcg 
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The resulting -0.8 kb of the Savinase fragment and the pMB1508 plasmid are digested 
with Kpnl and Notl, and the resulting fragments are then purifiied by agarose gel electrophoresis. 
The two fragments are ligated. and the ligation mixture is used to transform competent E. coli cells 
which are then plated on LB-agar plates or placed in liquid media for growth overnight at 37°C; 
both types of media containing 50-100ug/ml of Ampicillin. After incubation, a plasmid prep is made 
of the liquid culture. The purified plasmid is used for transformation of competent cells of MB1510 
(using 100-10.000 ng of plasmid per transformation. The transformed cells are plated onto TY 
medium with 2% skimmilk and 6 ug/ml of chloramphenicol for selection. After overnight incubation 
at 37°C clearing zones appear around those colonies wherein the integration cassette is 
: integrated properly into the cells, indicating high Savinase expression. 

This approach can also be used to make highly diverse libraries of any gene of interest 
expressable in B.subtilis, where rather than a gene encoding one enzyme, any expressable 
polynucleotide is inserted into the plasmid pMB1508 and integrated into the MB1510 strain for 
subsequent screening. 

Sequence of plasmid pMB1508 fSEQ ID NO: 21) 

The plasmid pMBl508 has the following components, indicated by basepair positions: 

BP 5186-395: pUC19 sequence from E.co// clone ATCC 37254, Vieira J, Messing J. The 
pUC plasmids, an M13mp7-derived system for insertion mutagenesis and sequencing with 
synthetic universal primers. Gene 19: 259-268, 1982. 

BP 396-1021: EcoR I cloning site (BP396-401) and the CrylllA mRNA stabilising 
element. (Described in WO 9634963-A1) 

BP 1022-1412: Encodes the Pre-Pro sequence of Savinase and the Not\ cloning site. 
(Pre-Pro part described in eg. WO 9623073-A1, the Afofl site and the spacing between the Pre- 
Pro and Notl was introduced by the PCR primer. 

BP 1413-2512: The Bgl II cloning site (BP1413-1418) and the Chloramphenicol acetyl- 
transferase operon of pDN1050 (Described in eg. Diderichsen.B.; Poulsen.G.B.; Joergensen.S.T.; 
A useful cloning vector for Bacillus subtilis. Plasmid 30:31 2 (1 993)). 

BP 2513-5185: The polynucleotide region [Pel-end-yflS-citS(start)] downstream of the 
pelB locus of the B.subtilis genome, (as it appeaars from the publication and corresponding 
database of: F. Kunst, N. Ogasawara, I. Moszer, <146 other authors>, H. Yoshikawa. A. Danchin. 
The complete genome sequence of the Gram-positive bacterium Bacillus subtilis" 
Nature (1997) 390:249-256). 



The Bacillus subtilis strain MB1510 

MB1510 has the following specific features in and around the pelB locus: 
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1716 We promoter and the mRNA stabnising element including a RBS (Ribosome binding 
sequence). 

10 Flanking segments comprising the following homologous polynucleotide region [yfmD-ytmC- 
yfmB-yfmA-Pel-start] upstream of the triple-promoter, and the polynucleotide region [Pel-end-yflS- 
citS(start)] downstream of the mRNA stabilizing sequence. 

Sequence of MB151Q genomic int Rgration region (SEQ ID NO- 99) 

BP 1-2873: corresponds to sequence of Bacillus subtilis genome yfmD-ytmC-yfmB- 
yfmA-Pel-start (as it appeaars from the publication and corresponding database of: F. Kunst et al. 
"The complete genome sequence of the Gram-positive bacterium Bacillus subtilis" 
Nature (1997) 390:249-256). 

BP 3102-4082: The triple promoter and CrylllA mRNA stabilising element plus RBS. 
(Described above in PL3598-37 construct). 

BP 4083-5718: The polynucleotide region [Pel-end-yfiS-citS(start)] end of and 
downstream of the pelB locus of the B.subtilis genome (as it appeaars from the publication and 
corresponding database of: F. Kunst, N. Ogasawara, !. Moszer, <146 other authors>, H. 
.'Yoshikawa.A. Danchin. 'The complete genome sequence of the Gram-positive bacterium Bacillus 
subtilis" Nature (1997) 390:249-256).— «--■»... . . - 

Example 5. Construction of a 2 amino-acid tail-variant library 

This example shows the construction of a tail-variant library. In this library two amino 
acids were introduced at the C-terminal of the 10R protein. Such a Tail-library may be made with 
the method described above using the following PCR primers in a PCR reaction using genomic 
DNA from B.subtllis 1 0RS as template: 

1605 (SEQ ID NO: 23): gacggccagtgaattcgataaaagtgc 

1606 (SEQ ID NO: 24): ccagatctctatnktnktgtacggagtctaactccccaagag 

wherein N = A C, G or T; and K = T or G. 

The resulting PCR product was digested with EcoR I and Bgl II and ligated into EcoR I 
and BglW digested pMB1508. Hereafter following the principle described above. 



Chloramphenicol resistant Bacillus subtilis transformants were picked by a robotic 
colony picker from a bioassay plate and transferred into a 384 well microtiter plate (MTP) 
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containing 0.05 X TY supplemented with 6 mg/l' chloramphenicol (60pl/well). The MTPs were 
incubated at 26»C for 72h. After incubation each well was analyzed for proteolytic activity. 

The thirty BacUlus subtilis transformants with highest proteolytic activity were selected 
for determination of the two tail amino acids in each transformant by DNA sequencing, the 
sequencing results are summaries in table 4 and table S. 



AATail 


No. of transformants 


TL 


4 


TT 


4 


QL 


3 


TP 


3 


LP 


3 


Tl 


2 


IQ 


2 


QP 


2 


PI 


2 


LT 




TQ 




IT 




QQ 




PQ 




Total 


30 j 



Table 4: column one shows the amino acid sequence of the tail, and column two shows the 
10 number of BacUlus subtilis transformants sequenced with that particular AA tail sequence. 



Possibilities position 1 


Result 


Possibilities position 2 


Result 


K 


0 


K 


0 


R 


0 


R 


0 


T 


14 


T 


6 


1 


3 


^ I 


4 



54 



Q 


6 


Q 


5 


P 


'3 ; 


P 


8 


L 


4 


L 


7 


Total 


30 


Total 


30 



Table 5: The table shows the amino acid which could be introduced by the primer used for the 
library construct and the actual findings by DNA sequencing of the thirty colonies isolated from 
screening. 

Example 6. Construction of Bacillus subtilis strains L2, L2 HVO, and L2 HV1 

A Bacillus subtilis strain was made analogously with the construction of the Bacillus 
subtilis strain 10RS, with the DNA coding for the pro-form of the S2A protease from Nocardiopsis 
dassonvillei subsp. Dassonvillei DSM 43235, denoted L2, fused by PCR in frame to the DNA 
coding for the signal peptide from SAVINASE™ (a well-known commercial protease derived from 
Bacillus clausii, available from Novozymes, Denmark), the resulting strain was denoted Bacillus 
subtilis Sav-L2. ,. . • 

* * " * • ■ • 

The DNA sequence including the coding region for the pro-mature S2A protease from 
Nocardiopsis dassonvillei subsp. Dassonvillei DSM 43235, as amplified with primers 1423 and 
1475, is shown in SEQ ID NO: 25. The corresponding encoded pro-form amino acid sequence for 
the L2 protease is shown in SEQ ID NO: 28. 

1423 (SEQ ID NO: 26): gcttttagttcatcgatcgcatcggctgctccggcccccgtcccccag 
1475 (SEQ ID NO: 27): ggagcggattgaacatgcgattaggtccggatcctgacaccccag 

Two tail-variants of this construct were also made. Tail variant Sav-L2 HVO was 
constructed to have 8 amino acids extra in the C-terminus: QSHVQSAP (SEQ ID NO: 3), by using 
the DNA sequence extension inserted in front of the TAA stopcodon which is shown in SEQ ID 
NO: 4. Tail variant Sav-L2 HV1 was constructed to have 4 amino acids extra in the C-terminus: 
QSAP (SEQ ID NO: 5), by using the DNA sequence extension inserted in front of the TAA 
stopcodon which is shown in SEQ ID NO: 6. Both tail variants had the SAVINASE™ signal- 
peptide encoding sequence fused in frame with the pro-mature encoding sequence, just like in 
Sav-L2. 

The Sav-L2 gene and the two tail-variants Sav-L2 HVO and Sav-L2 HV1 were integrated 
by homologous recombination on the Bacillus subtilis MB1053 host cell genome as outlined 
above. Chloramphenicol resistant transformants were checked for protease activity on 1% skim 
milk LB-PG agar plates (supplemented with 6 ug/ml chloramphenicol). Some protease positive 



colonies were further analyzed by DMA sequencing of the insert to confirm the correct DMA 
sequence, and one strain for each construct was selected and denoted B.subWis Sav-L2. 
B.subtifis Sav-L2 HVO, and B.subtilis Sav-L2 HV1 , respectively. 

Example 7. Fermentation yields of the Bacillus strains of example 6. 

The three B. subtilis strains of example 6, were fermented on a rotary shaking table in 
500 ml baffled Erlenmeyer flasks containing 100 ml TY supplemented with 6 mg/1 
chloramphenicol. Six Erlenmeyer flasks for each of the three B. subtilis strains were fermented in 
parallel. Two of the six Erlenmeyer flasks were incubated at 37°C (250 rpm), two at 30»C (250 
rpm), and the last two at 26'C (250 rpm). A sample was taken from each shake flask at day 1, 2 
and 3 and analyzed for proteolytic activity. The results are shown in tables 6-8. As it can be seen 
from tables 6-8, the effect of the tails also increases the expression level for the Sav-L2 protease 
from Nocardiopsis dassonvillei subsp. Dassonvillei DSM 43235 when expressed in a subtilis An 
increase of up to 40% is observed in this experiment, but overall improvement is observed for both 
tail-variants at all three temperatures tested. 



Table 6. Relative proteolytic activities at 37°C. 





Day 1 


Day 2 . 


Day 3 


Sav-L2 


1.0 


1.0 


1.0 


Sav-L2 HV1 


1.4 


1.3 


1.2 


Sav-L2 HVO 


1.3 


_. 1.1 


1.4 


Table 7. Relative proti 


solytic acti 
Day 1 


vities at 3< 
Day 2 


3 C C. 
Day 3 


Sav-L2 _i 


1.0 . 




.. 1,0 


Sav-L2 HV1 


- 1-0 . 


1.2 


1.4 


Sav-L2 HVO 


1,1 


1.3 


1.3 


Table 8. Relative prott 


jolytic acti 
Day 1 


vities at 2( 
Day 2 


3°C. 
Day 3 


Sav-Lz 


1.0 


1.0 


1.0 


Sav-L2 HV1 


1.3 


1.1 


1.1 


Sav-Lz HVO 


0,2 


1.1 


1,1 



Example 8. 10R tail-variants with heterologous pro-regions in Bacillus 

The DMA sequence coding for the pro-region from the L2 protease from Nocardiopsis 
dassonvillei subsp. Dassonvillei, DSM 43235 is shown in SEQ ID NO: 29, and the corresponding 
amino acid sequence is shown in SEQ ID NO: 30. A Bacillus subtilis strain denoted L21 OR, similar 
to the Bacillus subtilis strain 10RS, but with the DMA coding for the pro-region of the L2 replacing 
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the pro-region of 10RS. was made. The entire L210R protease encoding sequence incl. the pro- 
region of L2, is shown in SEQ ID NO: 31 . 

Two tail variants of the above construct were also made. Tail variant HVO was 
constructed to have 8 amino acids extra in the C-terminus: QSHVQSAP (SEQ ID NO: 3) with the 
5 DNA shown in SEQ ID NO: 4 inserted in front of the TAA stopcodon of the encoding sequence. 
Tail variant HV1 was constructed to have 4 amino acids extra in the C-terminus: QSAP (SEQ ID 
NO: 5) with the DNA sequence shown in SEQ ID NO: 6 inserted in front of the TAA stopcodon of 
the encoding sequence. 

The 10RL2) construct and the two tail variants were integrated by homologous 
10 .' recombination on the Bacillus subtilis MB1053 host cell genome. Chloramphenicol resistant 
transformants were checked for protease activity on 1% skim milk LB-PG agar plates 
(supplemented with 6 pg/ml chloramphenicol). Some protease positive colonies were further 
analyzed by DNA sequencing of the insert to confirm the correct DNA sequence, and a strain for 
each construct was selected, and denoted B.subtilis L210R, B.subW$ L210R HVO, and B.subtilis 
15 L210RHV1, respectively. 



20 



25 



30 



Example 9. Fermentation yields of 1 0R tail-variants with heterologous pro-region . 

The six B. subtilis strains 10RS.- 10RS HVO, 10RS HV1, L210R, L210R HVO, and L210R 
HV1 , were fermented on a rotary shaking table in 500 ml baffled Erlenmeyer flasks containing 100 
ml TY supplemented with 6 mg/l chloramphenicol. Six Erlenmeyer flasks for each of the B. subtilis 
strains were fermented in parallel. Two of the six Erlenmeyer flasks were incubated at 37°C (250 
rpm), two at 30«C (250 rpm), and the last two at 26'C (250 rpm). A sample was taken from each 
shake flask at day 1 , 2 and 3 and analyzed for proteolytic activity. The results are shown in figure 
1, and in tables 9-11. As it can be seen from the results, the effect of the exchange of the 
proregion from 10R with the proregion from the L2 protease resulted in a surprisingly high 
improvement on the expression level of the 10R protease as measured by proteolytic activity in 
the culture broth at 37°C. The effect is most pronounced in the two tail variants. " 

Table 9. Relative proteolytic activities at 37°C. 





Day 1 


Day 2 


Day 3 


10RS 


1.0 


1.0 


1,0 


10RS HVO 


3.7 


•8.9 


3.5 


10RS HV1 


3.9 


8.5 


4.3 


L210R 


1.9 


2.3 


1,6 


L210R HVO 


5,3 


14.4 


7,3 • 


L210R HV1 


9.1 


20.9 


7.6 



Table 10. Relative proteolytic activities at 30°C. 
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Day 1 


Day 2 


Daw 3 


10RS 


1.0 


1.0 


1 a 


10RS HVO 


2,8 


3.1 


A Q 


10RS HV1 


3.6 


3.6 


A Q 


L210R 


0.6 


...0,4 




L210R HVO 


3,5 


3.2 


/I c 

4.5 


L210R HV1 


3.7 


3.2 


4.5 


Table 11. Relative proteolytic activities at 26* 


c. 




Day 1 


Day 2 


Day 3 


10RS 


1.0 


1.0 


1,0 „ 


10RS HVO 


2.6 


3.0 


2.8 


10RS HV1 


3.7 


3,3 


3.1 


L210R 


0.4 


0,7 


0.4 


L210R HVO 


2,3 


2.1 


1,9 


L210R HV1 


2.2 


1.7 


1.7 



Example 10. Repeat of examples 1-9 with other 10R-Hke proteases 

Completely analogously with the above examples 1 through 9, similar experiments are 
carried out with the proteases of the following Nocardiopsis strains: 
fclNocardiopsis.dassonyilleiyRRL 18133 as.described in WO 88/03947; 

(b) Nocardiopsis sp. NRRL 18262 as described,* WO 88/03947,' the DMA and ".amino acid 
sequences of the protease derived from Nocardiopsis sp. NRRL 18262 are shown in DK 
patent application no. 1996 00013. and WO 01/58276 describes the use in animal feed of 
acid-stable proteases related to the protease derived from Nocardiopsis sp. NRRL no. 18262; 

(c) Nocardiopsis Atba DSM 15647; the amino acid sequence of the protease is SEQ ID NO: 33, 
the encoding nucleotide sequence is SEQ ID NO: 32; the gene is isolated from the genomic 
DNA of this strain by PCR-amplification using the two primers: 

1421 (SEQ ID NO: 34): gttcatcgatcgcatcggctgcgaccggccccctcccccagtc 
1604 (SEQ ID NO: 35): gcggatcctatcaggtgcgcagggtcagacc. 

(d) Nocardiopsis prasina DSM 15648; the amino acid sequence of the protease is SEQ ID NO: 
37, the encoding nucleotide sequence is SEQ ID NO: 36; the gene is isolated from the 
genomic DNA of this strain by PCR-amplification using the two primers: 

1 346 (SEQ ID NO: 38): gttcatcgatcgcatcggctgccaccggaccgctcccccagtc 

1602 (SEQ ID NO: 39): gcggatcctattaggtccggagacggacgccccaggag. 

(e) Nocardiopsis prasina DSM 15649; the amino acid sequence of the protease is SEQ ID NO: 
41, the encoding nucleotide sequence is SEQ ID NO: 40; the gene is isolated from the 
genomic DNA of this strain by PCR-amplification using the two primers: 

1603 (SEQ ID NO: 42): gttcatcgatcgcatcggctgccaccggaccactcccccagtc, and 1602 (SEQ ID 
NO: 39). 
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Example 11. In vivo monogastric performance of a 10R-like protease from DSM 43235 

The performance of the Nocardiopsis dassonvillei subspecies dassonvillei DSM 43235 
protease assayed in a monogastric in vitro digestion model. The performance of a purified 
preparation of the mature part of the protease having SEQ ID NO: 28 (prepared as described 
above) was tested in an in vitro mode! simulating the digestion in monogastric animals. In 
particular, the protease was tested for its ability to improve solubilisation and digestion of maize/- 
SBM (maizeAsoybean meal) proteins. In the tables below, this protease is designated "protease of 
the Invention." 

The in vitro system consisted of 15 flasks in which maizeASBM substrate was initially 
incubated with HCI/pepsin - simulating gastric digestion - and subsequently with pancreatin - 
simulating intestinal digestion. 10 of the flasks were dosed with the protease at the start of the 
gastric phase whereas the remaining flasks served as blanks. At the end of the intestinal 
incubation phase samples of in vitro digesta were removed and analysed for solubilised and 
digested protein. 



Table 12': Outline elf' in vitro digestion procedure 



Components added ' 


PH 


Temperature 


Time 
course 


Simulated digestion 
phase 


10 g matzeASBM substrate 
(6:4), 41 ml HCI (0.105M) 


3.0 


40°C 


t=0 min 


Mixing 


5 ml HCI (0.105M)/ pepsin 
(3000 U/g substrate), 1 mL 
protease of the invention 


3.0 


40°C 


t=30 min 


Gastric digestion 


16 ml H 2 0 ~ 


3.0 


40°C 


t= 1.0 hour 


Gastric digestion 


7 ml NaOH (0.39M) 


6.8 


40°C 


t=1.5 hours 


Intestinal digestion 


"5 mi NaHC0 3 (1M) T 
pancreatin (8mg/gdiet) 


6.8 


40'C 


t=2.0 hours 


Intestinal digestion 


Terminate incubation 


7.0 1 40°C 


t=6.0 hours 





Conditions 

Substrate: 

pH: 

HCI: 

pepsin: 

pancreatin: 



4 g SBM, 6 g maize (premixed) 

3.0 stomach step/ 6.8-7.0 intestinal step 

0.105 M for 1 .5 hours (i.e. 30 min HCI-substrate premixing) 

3000 U /g diet for 1 hour 

8 mg/g diet for 4 hours 



temperature: 40°C. 59 
Replicates: 5 



Solutions 
0.39 M NaOH 
0.105 MHCI • 

0.105 M HCI containing 6000 U pepsin per 5 ml 
1 M NaHC0 3 containing 16 mg pancreatin per ml 
125 mM NaAc-buffer, pH 6.0 



Enzyme protein determinations 

The amount of protease enzyme protein (in what follows. Enzyme Protein is abbreviated 
EP) is calculated on the basis of the A^o values and the amino acid sequences (amino acid 
compositions) using the principles outlined in S.C.Gill & P.H. von Hippel, Analytical Biochemistry 
182, 319-326, (1989). 

Experiment al procedure for in vitm mnriri ... 

The experimental procedure was according-to the above outline. pH was measured^ time 
1, 2.5, and 5.5 hours. Incubations were terminated after 6 hours and samples of 30 ml were 
removed and placed on ice before centrifugation (10000 x g. 10 min, 4*C). Supernatants were 
removed and stored at -20°C. 



Analysis 

All samples were analysed for % degree of protein with the OPA method as well as content 
of solubilised and digested protein using gel filtration. 

DH determi nation bv tha QPA-methnri 

The Degree of Hydrolysis (DH) of protein in different samples was determined using an 
semi-automated microliter plate based colorimetric method (Nielsen, P.M.; Petersen.D.; 
Dambmann,C. Improved method for determining food protein degree of hydrolysis. J.Food ScL 
2001, 66, 642-646). The OPA reagent was prepared as follows: 7.620 g di-Na tetraborate 
decahydrate and 200 mg sodiumdodecyl sulphate (SDS) were dissolved in 150 ml deionized 
water. The reagents were completely dissolved before continuing. 160 mg o-phthal-dialdehyde 
97% (OPA) was dissolved in 4 ml ethanol. The OPA solution was transferred quantitatively to the 
above-mentioned solution by rinsing with deionized water. 176 mg dithiothreitol 99% (DTT) was 
added to the solution that was made up to 200 ml with deionized water. A serine standard (0.9516 
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meqv/1) was prepared by solubilising 50 mg serine (Merck, Germany) in 500 ml deionized water. 

The sample solution was prepared by diluting each sample to an absorbance (280 nm) of 
about 0.5. Generally, supernatants were diluted (100 x) using an automated Tecan dilution station 
(Mannedorf, Switzerland). All other spectrophotometer readings were performed at 340 nm using 
deionized water as the control. 25 ul of sample, standard and blind was dispensed into a microtiter 
plate. The micro-titer plate was inserted into an iEMS MF reader (Labsystems, Finland) and 200ul 
of OPA reagent was automatically dispensed. Plates were shaken (2 min; 700 rpm) before 
measuring absorbance. Finally, the DH was calculated. Eightfold determination of all samples was 
carried out 

0 
r 

Estimation of solubilis ed and dig e sted protein 

The content of solubilised protein in supernatants from in vitro digested samples was 
estimated by quantifying crude protein (CP) using gel filtration HPLC. Supernatants were thawed 
filtered through 0.45 urn polycarbonate filters and diluted (1:50, wv) with H 2 0. Diluted samples 
were chromatographed by HPLC using a Superdex Peptide PE (7.5 x 300 mm) gel filtration 
column (Global). The eluent used for isocratic elution was 50 mM sodium phosphate buffer (pH 
7.0) containing 1 50 mM NaCI. The total volume of eluent per run was 26 ml and the flow rate was 
0.4 ml/min. Elution profiles were recorded at 214 nm and the total area under the profiles was 
determined by integration. To estimate protein content from integrated areas, a calibration curve 
(R -0.9993) was made from a dilution series of an in vitro digested reference maize/-SBM sample 
wrth known total protein content The protein determination in this reference sample was carried 
out using the Kjeldahl method (determination of % nitrogen; A.OAC. (1984) Official Methods of 
Analysis 14th ed., Washington DC). 

The content of digested protein was estimated by integrating the chromatogram area 
corresponding to peptides and amino acids having a molecular mass of 1500 Dalton or below 
(Savoie,L.; Gauthier.S.F. Dialysis Cell For The In-vitro Measurement Of Protein Digestibility J 
Food Sci. 1986, 51, 49*498; Babinszky.L; Van.D.M.J.M, Boer.H Den.H.LA. An In-vitro Method 
for Prediction of The Digestible Crude Protein Content in Pig Feeds. J. Sci. Food Agr. 1990, 50 
173-178; Boisen,S.; Eggum.B.O. Critical Evaluation of In-vitro Methods for Estimating Digestibility 
in Simple-Stomach Animals. Nutrition Research Reviews 1991, 4, 141-162). To determine the 
1500 Dalton dividing line, the gel filtration column was calibrated using cytochrome C (Boehringer 
Germany), aprotinin, gastrin I, and substance P (Sigma Aldrich, USA), as molecular mass 
standards. 



Results 

The results shown in Tables 13 and 14 below indicate that the protease increased the 
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Degree of Hydrolysis (DH), as well as soluble and digestible protein significantly. 
Table13: Degree of Hyd rolysis fDHV absolute and relative values 



Enzyme 

(dosage in mg EP/kg 
feed) 


n 


Of total protein 


Relative to blank 


%DH 


SD 


%DH 


%cv 


Blank ~ 


5 


26.84 


a 


0.69 . 


100.0 


a 


2.57 


Protease of the invention 
(100) 


5 


28.21 


b 


0.35 


105.1 


b 


1.25 



Kramer test, P<0.05). SD = Standard Deviation. %CV = Coefficient of Variance = (SD/mean value) 
x 100% 



Table 14: Solubilised and digested crude protein measured bv AkTA HPl r. 



Enzyme • 
(dosage In 
mg EP/kg 
feed) 


n 


Of total r 


>rotein 


Relative to blank « • 


%dig. 
CP 


SD 


%sol.CP 


SD 


%dig.CP 


CV% 


%sol.CP 


CV% 


Blank 


5 


54.1 


a 


1.1 


90.1 


a 


1.1 


100.0 


a 


2.0 


100.0 


a 


1.2 


Protease of 
the invention 
(50) 


5 


57.7 


b 


1.1 


93.2 


b 


1.4 


106.7 


b 


1.9 


103.4 


b 


1.5 


(100) 


5 


58.9 


b 


0.8 


94.8 


b 


0.9 


108.9 




1.3 


105.2 


b 


0.9 



wiiw^iiwo ^ i -way rAiNuvM, i urcey- 

Kramer test, P<0.05). SD = Standard Deviation. %CV = Coefficient of Variance = (SD/mean value) 
x 100% 



Example 12. In vitro aquaculture performance of 10R-like protease from DSM 43235 

Performance of the protease from Nocardiopsis dassonvillei subsp. dassonvillei DSM 
43235 in an aquaculture in vitro model. The protease preparation as described in Example 3 was 
tested in an aquaculture in vitro model simulating the digestion in coldwater fish. The in vitro 
system consisted of 15 flasks in which SBM substrate was initially incubated with HCI/pepsin - 
simulating gastric digestion - and subsequently with pancreatin - simulating intestinal digestion. 
10 of the flasks were dosed with the protease at the start of the gastric phase whereas the 
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remaining 5 flasks served as blanks. At the end of the intestinal incubation phase samples of in 
vitro digesta were removed and analysed for solubilised and digested protein. 

Table 15: Outline of aqua in vitro digestion procedure 



Components added 



10 g extruded SBM substrate, 
62 mL HCI (0.155M)/pepsin 
(4000 U/g substrate), 1 mL of 
the protease of the invention 



7mL NaOH (1.1M) 



5 mL NaHC0 3 (1M) / pancreatin 
(8 mg /g diet) 



Terminate incubation 



PH 



3.0 



6.8 



6.8 



7.0 



Temperature 



15 P C 



15°C 



15°C 



15°C 



Time 
course 



t=0 min 



t=6 hours 



t=7 hours 



t=24 
hours 



Simulated digestion 
phase 



Gastric digestion 



intestinal digestion 



intestinal digestion 



Conditions 
Substrate: 
pH: 
HCI: 
Pepsin: 
Pancreatin 
Temperature: 15°C 
Replicates: 5 



'10 g extruded SBM *' 
3.0 stomach step/ 6.8-7.0 intestinal step 
0.155 M for 6 hours 
4000 U/g diet for 6 hours 
8 mg/g diet for 17 hours 



15 Solutions 
1.1 M NaOH 

0. 155 M HCI / pepsin (4000 U/g diet) 

1 M NaHC0 3 containing 16 mg pancreatin/mL 
125 mM NaAc-buffer, pH 6.0 

20 

Experimental procedure for aoua in vitro model 

The experimental produce was according to the above outline. pH was measured at time 

1 . 5, 8 and 23 hours. Incubations were terminated after 24 hours and samples of 30 mL were 
removed and placed on ice before centrifugation (13000 x g, 10 min, 0°C). Supernatants were 

25 removed and stored at -20*C. 
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Analysis 

All supernatants were analysed using the OPA method (% degree of hydrolysis) and by AKTA 
HPLC to determine solubilised and digested protein (see monogastric example). 

Pre-treatment of in vit ro supernatants with EASY SPE columns 

Before analysis on AKTA HPLC supernatants from the in vitro system were pretreated 
using solid-phase sample purification. This was done to improve the chromatography and thereby 
_ prevent unstable elution profiles and baselines. The columns used for extraction were solid phase 
; extraction columns (Chromabond EASY SPE Columns from Macherey-Nagel). 2 mL milliQ water 
was eluted through the columns by use of a vacuum chamber (vacuum 0.15 x 100 kPa). 
Subsequently 3 mL in vitro sample was dispensed onto the column and eluted (vacuum 0.1 x 100 
kPa), the first % mL of eluted sample was thrown away and a clean tube was placed beneath the 
column, then the rest of the sample was eluted and saved for further dilution. 

Results 

The results shown in Tables 16 and 17 below indicate that the protease significantly 
increased Degree of hydrolysis and protein digestibility. ... 



20 



Enzyme 

(mg EP/kg diet) 


n 


Of total protein 


metnoa. aosoiute and r 
Relative to blank 


%DH 




SD 


%DH 




%cv 


Blank 


5 


21.30 


a 


0.52 


100.0 


a 


2.42 


Protease of the invention (50) 


5 


21.98 


b 


0.22 


103.2 




1.00 



significant differences (1-way ANOVA, Tukey- 
Kramer test. P<0.05). SD = Standard Deviation. %CV = Coefficient of Variance = (SD/mean value) 
X 100% 



TabtellLSolubilised and digested crude protein measured by AKTA HPLC, absolute and relative 
values 



Enzyme 
(mg EP/kg 
diet) 


N 




C 


Jt total 


protein 


Relative to blank 


%CP 
dig 




SD 


%CP 
sol 




so 


%CP 
dig 




%CV 


%CP 
sol 




%cv 


Blank 


5 


50.0 


"b 


2.2 


69.9 




3.2 


100.0 


a 


4.5 


100.0 




3.5 


Protease of 


5 


52.3 




1.1 


91.4 


a 


1.5 


104.8 


D 


2.1 


101.7 


5 


1.6 



the 

invention 

(50) 
O00) 



53.4 



0.4 



64 



91.6 



1.0 



107.0 



0.7 



101.9 



1.1 



[1-way ANOVA, Tukey 



Different letters within the same column indicate significant differences 
Kramer test, P<0.05). SD = Standard Deviation. %CV = Coefficient of Variance = (SD/mean value) 
x100%. 



Example 13. Fermentation and activity of 10R tail-variants TQ and TP with Savlnase signal 

Two of the a subtilis strains of Example 5, strain 209 with the amino acid tail-variant 
TQ, and strain 21 1 with the tail-variant TP, together with B.subtilis Sav-10RS, were fermented on 
a rotary shaking table in 500 ml baffled Erlenmeyer flasks containing 100 ml TY supplemented 
with 6 mg/l chloramphenicol. Twelve Erlenmeyer flasks for each of the three B. subtilis strains 
were fermented in parallel. Four of the twelve Erlenmeyer flasks were incubated at 37°C (250 
rpm), four at 30'C (250 rpm), and the last four at 26'C (250 rpm). A sample was taken from each 
shake flask at day 1 , 2 and 3 and analyzed for proteolytic activity. The results are shown in tables 
1.8 to 20 below. 

As it can be seen from tables below, the effect of the 2 amino acid tails is a surprisingly 
high improvement on the yield of the protease, as measured by activity in the culture broth. The 
effect of the 2 amino acid tails is comparable to the effect observed for Sav-10RS HV1 and Sav- 
10RSHV3 in Example 1. 



Table 18: Relative proteolytic activities at 37°C. 





1 


2 


3 


10Rsynt-15 


1.0 


.. 1,0 


1,0 


209 


7,0 


7,0 


6,0 


211 


7.2 


7.7 


4.9 .,. 



Table 19: Relative proteolytic activities at 30°a 





1 


2 


3 


10Rsynt-15 • 


1,0 .„ . 


1,0 


1.0 


209 


4.5 


3.6 


4.9 


211 


. 4,0 


4,1 


. 5,0 



Table 20- Relative proteolytic activities at 26°C. 





1 


2 


3 


10R synt-15 


. 1,0 


.. 1f0 


1.0 


209 


6,4 


4.3 


4,0 


211 


. 3,7 


4,1 


4,2 
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Example 14. Synthetic shuffled 10R-like protease tail-variants with signal 

Another synthetic tail variant 10R protease encoding gene, denoted G-MAT-22, was 
constructed with a signal peptide, and the 8 amino acid C-terminal tail of SEQ ID NO: 3, and 
introduced into a Bacillus subtilis host for expression. A surprisingly high yield of protease was 
achieved (data not shown). The full coding DNA sequence of G-MAT-22 is shown in SEQ ID NO: 
44, and the encoded pre-pro-protease'is shown in SEQ ID NO: 45. The G-mat-22 protease is an 
alpha-lytic protease-like enzyme (peptidase family S1E - old notation: S2A). This protease has a 
• higher temperature optimum (at pH 9) than the 10R protease, as shown in Figure 1. 

Example 1 5: Shuffled Pro-sequences of 1 0R-lf ke Proteases 

Recombination of protease genes can be made independently of the specific sequence of 
the parents by synthetic shuffling as described in Ness, J.E. et al 2002 [Nature Biotechnology, Vol. 
20 (12), pp. 1251-1255. 2002]. Synthetic oligonucleotides degenerated in their DNA sequence to 
provide the possibility of all amino acids found in the set of parent proteases are designed and the 
genes assembled according to the reference. The shuffling can be earned out for the full length 
sequence or for only part of the sequence and then later combined with the rest of the gene to 
give a full length sequence. 

In this example the amino acid sequence for the Pro-peptide part of the parent proteases 
given in SEQ ID NO: 28; SEQ ID NO: 33; SEQ ID NO: 37; SEQ ID NO: 41; SEQ ID NO: 43; or 
SEQ ID NO: 45 is encoded by a set of oligonucleotides and the resulting shuffled gene fragments 
are combined into the context of the full length protease gene, which then consists of DNA coding 
for the signal sequence, the (shuffled) Pro-peptide, and in this case the mature protein of 10R 
protease. Examples of shuffled Pro-peptide sequences are shown in SEQ ID NO: 46 (0-2 19) 
SEQ ID NO: 47 (G-2.73), SEQ ID NO: 48 (G-1.43), SEQ ID NO: 49 (G-2.6), SEQ ID NO: 50 (G- 
2.5), SEQ ID NO: 51 (G-2.3), SEQ ID NO: 52 (G-1 .4), and SEQ ID NO: 53 (G-1 .2). 

The complete protease encoding genes were inserted into the genome of B.subWs by 
homologous recombination as described above, and the proteases expressed in shakeflasks 
using a rich media. The fermentation was carried out for 5 days at 30°C and the supernatant 
isolated by centrifugation prior to measuring the protease activity. As control a B.subWis clone 
expressing the wild type protease 10R from Nocatdiopsis sp. NRRL 18262 from an identical 
construction protocol was fermented under the same conditions. The protease activity was 
calculated and is presented in the table below relatively to the activity of the wild type 10R 
protease. Clearly the heterologous pro-regions provide an advantage over the native pro-region of 
the 1 0R protease. 
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[ Rel. acivity 


10R 


1.0 


G-1.2 


2.9 


G-1.4 


1.4 


G-2.3 


1.6 


G-2.4 


3.4 


G-2.5 


3.6 


G-2.6 


4.2 


G-2.7 


1.6 
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; Table 21: Relative activity of 10R protease expressed with heterologous shuffled pro-peptides. 
Example 16. In vivo monogastric performance of tail-variant 10R-HV1 

This example describes a dose/response study with the four amino acid tail variant HV1 
of the 10R protease in the monogastric in vitro model using 10, 25. 50. and 100 mg EP/kg, and 
using 10R protease as benchmark or control. The tail variant 10R-HV1 was constructed to have 4 
amino acids extra in the C-terminus: QSAP (SEQ ID NO: 5) as described above. 
In vitro conditions: 

4 g SBM, 6.g maize (premixed) 

• 4, 

. 3.0 stomach step/ 6.8-7.0 intestine step 
0.105 M for 1.5 hours (i.e. 30 min HCI-substrate premixing) 
3000 U/g diet fori hour 
8 mg/g diet for 4 hours. 
40°C. 
5 



Substrate: 
^pH: ... 
HCI: 
Pepsin: 
Pancreatin: 
Incubation: 
Replica: 



Enzymes: 

10R protease: FFE-2003-00047; batch PPA21400; 154 mg EP/g product 

Freezedried 10R-HV1 FFE-2003-00077; 370 mg EP/g product 

Solution A: 10R. 100 mg EP/ka diet: 

100 mg EP/kg diet -1 mg EP/flask => 1 mg EP/mL 

(1 mg EP/mL * 10 mL)/ 154 mg EP/g product = 0.0649 g 

Prepare 10 mL: Disolve 0.0649 g enzyme in 10 mL NaAc buffer. 

Solution C: 10R-HV1 mp mo EP/ko diet- 

100 mg EP/kg diet -1 mg EP/flask => 1 mg EP/mL 

(1 mg EP/mL * 20 mL)/ 370 mg EP/g product - 0.05405 g 
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Prepare 20 mL: Disolve 0.0541 g enzyme in 20 mL NaAc buffer. 

Solution D: 1 0R-HV1. 50 ma EP/ka diet: 
50 mg EP/kg diet ~0.50 mg EP/flask via 1 ml = 0.50 mg EP/ml 
5 Prepare 1 0 mL: Dilute C 2 times: 5 ml solution C + 5 ml 1 25 mM NaAc-buffer 

Solution E: 10R-HV1. 25 ma EP/kg dtet- 

25 mg EP/kg diet -0.25 mg EP/flask via 1 ml ■ 0.25 mg EP/m) 

Prepare 12 mL: Dilute C 4 times: 3 ml solution C + 9 ml 125 mM NaAc-buffer 

10 : 

Solution F: 1QR-HV 1. 1Q mo EP/kg diet- 

25 mg EP/kg diet -0.25 mg EP/flask via 1 ml = 0.25 mg EP/ml 

Prepare 10 mL: Dilute C 10 times: 1 ml solution C + 9 ml 125 mM NaAc-buffer 

15 Substrates: 

Premix (40% SBM / 60% maize), FFS-2002-001 21 

The 10 g sample contains 6 g maize- and 4 g SBM giving a calculated protein content of 23.48 % . 
of protein (~ 2.35 g/flask). » . • - 



20 Chemicals: 

4.005 M HCI, AT-1-00061/29 
4.007 M NaOH, AT-1 -00002/36 
Pancreatin FFE-2002-00052, 8xUSP 
Pepsin FFE-2003-00048, 471 U/mg 

25 

NaOH 0.39 M: 
Prepare 500 mL: 

48.97 mL 3.982 M NaOH, fill with milliQ to 500 mL. 

HCI1 solution 0.105 M 
30 Prepare 2000 mL: 

52.43 mL of 4.005 HCI, fill with milliQ up to 2000 mL 



HCL2 (HCI/oepsinl solu tion: 0.105 M containino 30000 U pepsin/5 mL 
Prepare 250 mL: 



68 

Take out approx. 150 mL from the HCI-solution, add 3.18 g pepsin and fill up to 250 mL with the 
HCI solution. 



125 mM NaAc-buffer. pH 6 0: 

Prepared from a 2 M NaAc-buffer (KLu 04-07-2003/lab book 14169 p. 1 04) 
-» 12.5 mL 2 M NaAc-buffer, fill up to 200 mL with milliQ 

Pancreatin dissolved in 1 M NaH CO, containing 8 ma pancreatin/a diet: 

NaHC0 3 -pancreatin is pre made, divided into portions and frozen. Made 29-04-2003 and frozen, it 
» is slowly thawed in refrigerator over night. The stock preparation is described in lab. book 14165 
page 068. 



Flow scheme: 

In the Premixing phase (t=0), 10 g substrate is mixed with 41 ml HCI1; then in the gastral 
phase (t=30 min) 5 ml HCI-2 (HCI/pepsin) + 1 ml enzyme (or buffer) is added, and later (t=1 h) the 
pH is measured and 16 ml water is added; and then in the intestinal phase (t=1h 30 min) 7 ml 0.39 
NaOH is added, and later (t=2h) 5 ml NaHC03/pancreatin is added and the pH is measured again 
twice (t=2h 30 min & t=5h 30 min); and finally (t=6h) 30 ml suspension is sampled for 
centrifugation. Each supernatant is immediately and carefully removed from the centrifuge tube 
into glass tubes. The supernatants are split in two aliquots for further analysis. Results are shown 
in table 22. 



Table 22: Treatment of samples in the monogastric in vitro model. 



Sample 


Enzyme 
Solution 


Enzyme 


pH 


Enzyme dose 
/kg diet: 


Pepsin 
U/g diet: 


Pancreatin 
mg/g diet: 


1 -5 


1 ml 
Buffer 


Blank 


3.0 


Omg EP 


3000 


8.0 


6-10 


1 ml 
Solution A 


10R 

(FFE-2003-00047) 


3.0 


100 mg EP 


3000 


8.0 


11-15 


1 ml 
Solution C 


10R-HV1 
(PPA22873) 


3.0 


100 mg EP 


3000 


8.0 


16-20 


1 ml 
Solution D 


10R-HV1 
(PPA22873) ; 


3.0 


50 v mg EP 


3000 


8.0 


21-25 


1 ml 


10R-HV1 


3.0 


25 mg EP 


3000 


8.0 





Solution E 


(PPA22873) 










26-30 


1 ml 
Solution F 


10R-HV1 
(PPA22873) 


3.0 


10 mg EP 


3000 


8.0 



Soluble and Diaestible Prnteirv 

The changes in the levels of soluble and digestible crude protein in the soluble phase of 
the hydrolysates were determined using an AKTA HPLC (Superdex 30 peptide column). The 
results are shown in Table 23. 

At a 10R-HV1 dose of 100 mg EP/kg diet the level of Digestible protein was significantly 
increased by 9.8%, compared to Blank. The control 10R showed a relative improvement of 7.7%. 
With the lower enzyme concentrations (50, 25, and 10 mg EP/kg diet) the relative improvements 
of Digestible protein were 5.7%, 3.3% and 0.7%, respectively. 

Table 23: HPLC results with 10R-HV1 and 10R showing the percentual changes in digestible CP 
and soluble CP relative to blank. Different letters on top of the bars indicate significant differences 
(-1-ANOVA, Tukey, 95%). . 



MoFi030D43 a day 
2 

Enzyme [mg 
EP/kg] 

Blank 

10R HV1 [100] 
10R HV1 [50] 
10R HV1 [25] 
10R HV1 [10] 
10R [100] 







Of total protein 






Relati 


n 


%dig. CP 


SD 


%sol.CP 


SD 


%dig.CP 


CV% 


11 


54.8 


1.2 


83,9 


1.6 


100.0 8 


2,3 


5 


60,2 


0,8 


88,3 


1.1 


109.8 0 


1.3 


5 


58,0 


0,6 


86,9 


0,8 


105.7 * 


1.1 


S 


56,7 


0.8 


86,3 


1.0 


103.3 


1.4 


5 


55.2 


1.3 


84,3 


1.9 


100.7* 


2,4 
1.4 


5 


59.1 


0.8 


87.1 


1.9 


107.7 



%sol.CP 



CV% 
1.9 
1.3 
0.9 
1.2 
2.2 
2.2 



100.0 ' 
105,3 e 
103.6 00 
102.9 ^ 
100,5"* 
103,9 68 



The original 10R [100 mg EP/kg diet] improved the level of soluble protein by about 4%. 
The effects of 10R-HV1 was slightly higher (5.3% relative increase) and significant. With a dose of 
50 and 25 mg EP/kg diet the relative improvements were 3.6% and 2.9%. respectively and 
significant. With 10 mg EP/kg diet the relative improvement was 0.5%. 

Degree of Hydrolysis- 

The degree of hydrolysis (DH) was determined using the OPA method. Results are shown 
in Table 24. 
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10 



| — ,B ^jr'iiw liny c~ / rvyj 


n 


Of total protein 
%DH SD 


Relative to blank 
%DH %CV 


Blank 


5 


25,89 


a 


0,43 


100,0 


a 


1,65 


10R (FFE-2003-00047) M001 


5 


27,19 


be 


0,67 


105,0 


be 


2,46 


IORhv, [100] 


5 


27,89 


rsr 


0,36 


. 107,7 


c 


1,29 


10R H vi [50] 


5 


27,34 




0,57 


105,6 


be 


2,08 


10R HV i [25] 


5 


26,42 


so- 


0,57 


102,0 


ab 


2,16 


10R H vi[10] H 


5 


25,52 


a 


0,96 


98,6 


a 


3,76 



- Table 24: Degree of Hydrolysis (DH) determined by the OPA method. Absolute as well as relative 
values are shown. Different letters indicate significant differences (1-way ANOVA, Tukey 95 %). 

Tail-variant 10R-HV1 improved DH by 7.7%, compared to Blank. With the lower doses (50 
and 25 mg EP/kg diet) of the protease the improvements ranged from 5.6 - 2.0%, respectively, in 
line with previous findings. At the lowest dose [10 mg EP/kg diet] no effect was seen. The original 
1 0R [1 00 mg EP/kg diet] showed improvements of 5% relative to Blank. 

The results of the HLPC AKTA analysis and the DH determinations clearly show that 
addition of the four amino acid (SEQ ID NO: 5) long tail to 10R does not affect the performance of 
the 10R protease to any significant extent 



CLAIMS 71 

1. A secreted mature polypeptide which after maturation has protease activity, which polypeptide 
when expressed and before maturation comprises a heterologous pro-region, and which 
polypeptide: 

(a) comprises an amino acid sequence which is at least 70% identical to the amino acid 
sequence of the mature part of the polypeptide shown in SEQ ID NO: 28; SEQ ID NO: 
33; SEQ ID NO: 37; SEQ ID NO: 41; SEQ ID NO: 43; or SEQ ID NO: 45; 

(b) comprises an amino acid sequence which is at least 70% identical to the amino acid 
sequence of the mature part of the polypeptide encoded by the polynucleotide In SEQ 
ID NO: 1; SEQ ID NO: 2; SEQ ID NO: 25; SEQ ID NO: 31; SEQ ID NO: 32; SEQ ID 
NO: 36; SEQ ID NO: 40; or SEQ ID NO: 44; 

(c) comprises a mature part which is a variant of the mature part of the polypeptide having 
the amino acid sequence of SEQ ID NO: 28; SEQ ID NO: 33; SEQ ID NO: 37; SEQ ID 
NO: 41; SEQ ID NO: 43; or SEQ ID NO: 45. the segment comprising a substitution, 
deletion, extension, and/or insertion of one or more amino acids; 

(d) is an allelic variant of (a), (b), or (c); . 
, . .<e) isafragmentof(a),(b),(c).or(d). 

2. The polypeptide according to claim 1, wherein the heterologous pro-region is derived 
from a protease, preferably the pro-region is derived from an S2A or S1E protease, and most 
preferably it is at least 70% identical to the pro-region shown in SEQ ID NO: 28. SEQ ID NO: 30. 
SEQ ID NO: 33. SEQ ID NO: 37, SEQ ID NO: 41, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 
47, SEQ ID NO: 48, SEQ ID NO: 49, SEQ ID NO: 50, SEQ ID NO: 51, SEQ ID NO- 52 SEQ ID 
NO: 53. 



3. The polypeptide according to claim 1 or 2, which is a wildtype polypeptide, an artificial variant of 
a wildtype polypeptide said variant having one or more amino-acid(s) added to the C-terminus as 
compared to the wildtype. a shuffled polypeptide, or a protein-engineered polypeptide. 

4. The polypeptide according to any of claims 1 - 3, which comprises at least three non-polar or 
uncharged polar amino acids within the last four amino acids of the C-terminus of the polypeptide 

5. The polypeptide according to daim 3, wherein the one or more added amino acid(s) is (are) 
non-polar or uncharged. 



6. The polypeptide according to claim 5, wherein the one or more added amino acid(s) is one or 
more of Q, S, V, A, or P. 

7. The polypeptide according to claim 3, wherein the one or more added amino acids are selected 
from the group consisting of: QSHVQSAP, QSAP, QP, TL, TT, QL. TP. LP, TJ, IQ, QP, p|, LT. TQ, 
IT, QQ, and PQ-. 

8. The polypeptide according to any of claims 1 - 7 which when expressed and before maturation 
comprises a heterologous secretion signal-peptide which is cleaved from the polypeptide when 
the polypeptide is secreted, preferably the heterologous secretion signal peptide is derived from a 
heterologous protease. 

9. The polypeptide according to claim 8. wherein the heterologous secretion signal peptide 
comprises an amino acid sequence having a sequence identity of at least 70% with the amino 
acid sequence encoded by polynucleotides 1 - 81 of SEQ ID NO: 2, or SEQ ID NO: 44. 

10. An isolated polynucleotide encoding a polypeptide as defined in any of claims -1-9. • 

» • 

11. A recombinant expression vector or polynucleotide construct comprising a polynucleotide as 
defined in claim 10. 



12. A recombinant host cell comprising a polynucleotide as defined in claim 10, or an expression 
vector or polynucleotide construct as defined in claim 1 1. 

13. The recombinant host cell according to daim 12 which is a Bacillus cell. 

14. A transgenic plant, or plant part, comprising a polynucleotide as defined in claim 10, or an 
expression vector or polynucleotide construct as defined in claim 11. 

15. A transgenic, non-human animal, or products, or elements thereof, comprising a 
polynucleotide as defined in claim 10, or an expression vector or polynucleotide construct as 
defined in claim 11. 



16. A method for producing a polypeptide as defined in any of claims 1 - 9, the method 
comprising: (a) cultivating a recombinant host cell as defined in claim 12 or 13, or a transgenic 



plant or animal as defined in claims 14 or 15. to produce a supernatant comprising the 
polypeptide, and optionally (b) recovering the polypeptide. 



17. An animal feed additive comprising at least one polypeptide as defined in any of claims 1 - 9; 
and 

(a) at least one fat-soluble vitamin, and/or 

(b) at least one water-soluble vitamin, and/or 

(c) at least one trace mineral. 

' 1 8. An animal feed composition having a crude protein content of 50 to 800 g/kg and comprising 
at least one polypeptide as defined in any of claims 1 - 9, or at least one feed additive of claim 17. 

19. A composition comprising at least one polypeptide as defined in any of claims 1 - 9, together 
with at least one other enzyme selected from amongst phytase (EC 3.1.3.8 or 3.1.3.26); xylanase 
(EC 3.2.1.8); galactanase (EC 3.2.1.89); alpha-galactosidase (EC 3.2.1.22); protease (EC 3.4.-.-), 
phospholipase A1 (EC 3.1.1.32); phospholipase A2 (EC 3.1.1.4); lysophospholipase (EC 3.1.1.5)'; 
phospholipase C (3.1.4.3); phospholipase D (EC 3.1.4.4); and/or beta-glucanase (EC 3.2.1 4 or 
EC 3.2.1.6). 

20. A method for using at least one polypeptide as defined in any of claims 1 - 9, for improving the 
nutritional value of an animal feed, for increasing digestible and/or soluble protein in animal diets, 
for increasing the degree of hydrolysis of proteins in animal diets, and/or for the treatment of 
vegetable proteins, the method comprising including the polypeptide^) in animal feed, and/or in a 
composition for use in animal feed. 

21. A method for using at least one polypeptide as defined in any of claims 1 - 9, comprising 
including the poiypeptide(s) in a detergent formulation. 
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ABSTRACT 

A secreted mature polypeptide derived from an S2A or S1E protease which after 
maturation has protease activity, which polypeptide when expressed and before maturation 
comprises a heterologous pro-region. 



1 

Figure 1/1 
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SEQUENCE LISTING 0 | 2004 

<110> Lassen, soren FT ens ted 

<120> improved fusion proteases and methods for producing them 

<130> 10495. 010-DK 

<160> 53 

<170> Patentin version 3.2 

<210> 1 

<211> 1062 

<212> DNA 

<213> Nocardiopsis sp. nrrl 18262 
<220> 

<221> misc_feature 

<222> CD . . C495) 

wn^ es pro-region shown in positions -165 to -1 of SEQ id 

<220> 

<221> misc_feature 

<222> (496) . . C1059) 

<223> F,«5?a es the mature region shown in positions 1-188 of seq id 

NO ' 4 3 • 



<400> 1 














gctactggag 


cattacctca 


gtctcctaca 


cctgaagcag 


atgcagtatc 


gatgcaagaa 


60 


gcattacaac 


gtgatcttga 


tcttacatca 


gctgaagctg 


aggaattact 


tgctgcacaa 


120 


gatacagcct 


ttgaagttga 


tgaagctgcc 


gctgaagcag 


ctggtgatgc 


atatggtggt 


180 


tcagtattcg 


atactgaatc 


actcgaactt 


actgtactag 


tgaccgatgc 


agcagctgtt 


240 


gaagctgttg 


aagccacagg 


tgcaggtaca 


gagctcgtat 


cttatggtat 


tgatggatta 


300 


gatgagatcg 


tacaagagct 


taatgcagct 


gatgccgttc 


caggtgtagt 


tggatggtat 


360 


cctgatgtag 


caggtgatac 


tgttgtctta 


gaagttcttg 


aaggctctgg 


agctgatgtt 


420 


tctggacttt 


tagcagacgc 


aggagtcgat 


gcatccgcgg 


ttgaagtgac 


cacgtcagat 


480 


cagcctgaac 


tctatgccga 


tatcattgga 


ggcctagcgt 


acacaatggg 


tggtcgctgc 


540 


agcgtaggat 


ttgcagccac 


aaatgcagct 


ggacaacctg 


gcttcgtgac 


agctggacat 


600 


tgcggccgcg 


tcggtacaca 


ggttactatc 


ggcaatggaa 


gaggtgtctt 


tgagcaaagc 


660 


gtatttcccg 


ggaatgatgc 


tgccttcgtt 


agaggtacgt 


ccaactttac 


gcttactaac 


720 


ttagtatcta 


gatacaacac 


tggcggatat 


gcaactgtag 


caggtcacaa 


tcaagcacct 


780 


attggctcta 


gcgtctgccg 


ctcagggtcg 


actacaggat 


ggcattgtgg 


aaccattcaa 


840 


gctagaggtc 


agagcgtgag 


ctatcctgaa 


ggtaccgtaa 


cgaacatgac 


tcgtacgact 


900 


gtatgtgcag 


aaccaggtga 


ctctggaggt 


tcatatatca 


gcggtacgca 


agcgcaaggc 


960 


gttacctcag 


gtggatccgg 


taactgtagg 


acaggtggca 


caacgttcta 


ccaggaagtg 


1020 


acaccgatgg 


tgaactcttg 


gggagttaga 


ctccgtacat 


aa 




1062 
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<21Q> 2 10495.020-DK.ST25.txt 

<211> 1143 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> "M^fSIIS hS R 0 Po n * C10 Rs ynt-15]i encoding a s2a protease denoted 
«mm»J? Zf *\Z5 R ^ frame t0 the si 9nal Peptide encoding 
sequence of a heterologous protease, savinase. y 



<400> 2 














atgaagaaac 


cgttggggaa 


aattgtcgca 


agcaccgcac 


tactcatttc 


tgttgctttt 


60 


agttcatcga 


tcgcatcggc 


tgctactgga 


gcattacctc 


agtctcctac 


acctgaagca 


120 


gatgcagtat 


cgatgcaaga 


agcattacaa 


cgtgatcttg 


atcttacatc 


agctgaagct 


180 


gaggaattac 


ttgctgcaca 


agatacagcc 


tttgaagttg 


atgaagctgc 


cgctgaagca 


240 


gjctggtgatg 

r 


catatggtgg 


ttcagtattc 


gatactgaat 


cactcgaact 


tactgtacta 


300 


gtgaccgatg 


cagcagctgt 


tgaagctgtt 


gaagccacag 


gtgcaggtac 


agagctcgta 


360 


tcttatggta 


ttgatggatt 


agatgagatc 


gtacaagagc 


ttaatgcagc 


tgatgccgtt 


420 


ccaggtgtag 


ttggatggta 


tcctgatgta 


gcaggtgata 


ctgttgtctt 


agaagttctt 


480 


gaaggctctg 


gagctgatgt 


ttctggactt 


ttagcagacg 


caggagtcga 


tgcatccgcg 


540 


gttgaagtga 


ccacgtcaga 


tcagcctgaa 


ctctatgccg 


atatcattgg 


aggcctagcg 


600 


tacacaatgg 


STtggtcgctg 


cagcgtagga 


tttgcagcca 


caaatgcagc 


tggacaacct 


660 


ggcttcgtga 


cagctggaca 


ttgcggccgc 


gtcggtacac 


aggttactat 


cggcaatgga 


720 


agaggtgtct 


ttgagcaaag 


cgtatttccc 


gggaatgatg 


ctgccttcgt 


tagaggtacg 


780 


tccaacttta 


cgcttactaa 


cttagtatct 


agatacaaca 


ctggcggata 


tgcaactgta 


840 


gcaggtcaca 


atcaagcacc 


tattggctct 


agcgtctgcc 


gctcagggtc 


gactacagga 


900 


tggcattgtg 


gaaccattca 


agctagaggt 


cagagcgtga 


gctatcctga 


aggtaccgta 


960 


acgaacatga 


ctcgtacgac 


tgtatgtgca 


gaaccaggtg 


actctggagg 


ttcatatatc 


1020 


agcggtacgc 


aagcgcaagg 


cgttacctca 


ggtggatccg 


gtaactgtag 


gacaggtggc 


1080 


acaacgttct 


accaggaagt 


gacaccgatg 


gtgaactctt 


ggggagttag 


actccgtaca 


1140 


taa 












1143 



<210> 3 

<211> 8 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> Sn^nJlUn 1 amin ° ac1d tail ex P r essed as fusion to protease of the 

<400> 3 

Gin ser His val Gin ser Ala pro 
1 5 

<210> 4 
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<211> 24 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Polynucleotide encoding a C-terminal amino acid tail expressed as 
fusion to protease of the invention. 

<400> 4 

caatcgcatg ttcaatccgc tcca 24 

<210> 5 

<2U> 4 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> C-terminal amino acid tail expressed as fusion to protease of the 
invention, 

<400> 5 

Gin ser Ala Pro 
1 

<210> 6 
<211> 12 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Polynucleotide encoding a C-terminal amino acid tail expressed as 
fusion to protease of the invention. 

<400> 6 

caatcggctc ct !2 

<210> 7 

<211> 2 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> C-terminal amino acid tail expressed as fusion to protease of the 
invention. 

<400> 7 

Gin Pro 
1 

<210> 8 
<211> 6 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Polynucleotide encoding a C-terminal amino acid tail expressed as 
fusion to protease of the invention. 

<400> 8 

caacca * 
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<210> 9 
<211> 1 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> c-terminal amino acid tail expressed as fusion to protease of the 
invention. 

<400> 9 

Pro 
1 



<210> 10 

<211> 3 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Polynucleotide encoding a c-terminal amino acid tail expressed as 
fusion to protease of the invention. 

<400> 10 

cca a 



<210> 11 

<211> 45 

<212> DMA 

<213> Artificial sequence 
<220> 

<223> Primer #252639 

<400> 11 

catgtgcatg tgggtaccgc aacgttcgca gatgctgctg aagag 45 

<210> 12 

<211> 44 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #251992 

<400> 12 

catgtgcatg tggtcgaccg attatggagc ggattgaaca tgcg 44 

<210> 13 

<211> 44 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #179541 

<400> 13 

gcgttgagac gcgcggccgc gagcgccgtt tggctgaatg atac 44 

<210> 14 

<211> 43 

<212> DNA 

<213> Artificial sequence 

Page 4 
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<220> 

<223> Primer #179542 
<40O> 14 

gcgttgagac agctcgagca gggaaaaatg gaaccgcttt ttc 

<210> 15 

<211> 64 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #179539 

<400> 15 

ccatttgatc agaattcact ggccgtcgtt ttacaaccat tgcggaaaat agtcataggc 
atcc 



<210> 16 

<211> 60 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #179540 

<400> 16 

ggatccagat ctggtacccg ggtctagagt cgacgcggcg gttcgcgtcc ggacagcaca 

<210> 17 

<211> 37 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #179154 

<400> 17 

gttgtaaaac gacggccagt gaattctgat caaatgg 

<210> 18 

<211> 37 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #179153 

<400> 18 

ccgcgtcgac actagacacg ggtacctgat ctagatc 

<210> 19 

<211> 22 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #317 

<400> 19 

tggcgcaatc ggtaccatgg gg 

page 5 



10495 . 020- DK . ST25 . txt 



<210> 20 

<211> 40 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer #139 Notl 

<400> 20 

catgtgcatg cggccgcatt aacgcgttgc cgcttctgcg 40 

<210> 21 

<211> 7443 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> sequence of plasmid pMB1508 

<400> 21 



tcgcgcgttt 


cggtgatgac 


QQtaaaaacc 


*>v t.y 0.1*0. 1. CI 1. 


ycdgci-cccg gagacggxca 


OO 


cagcttgtct 


gtaagcggat 


QCcaaaaaca 


ya>v.cLay wi»v»y 


tcagggcgcg tcagcgggtg 


120 


ttggcgggtg 


tcgggqctqq 


cttaactrata 


^-yy *-cl l v-ciy a 


gcagauxgra cxgagagxgc 


180 


accatatgcg 


gtgtgaaata 


ccacacaaat 


y (.aayyay 


aaaauaccgc axcaggcgcc 


240 


attcgccatt 


caggctgcgc 


aactatxaaa 


«»y yy v»ya li» 


yyxgcgggcc TCttcgctat 


300 


tacgccagct 


ggcgaaaqqq 




v.aayy v.ya i_ V. 


aaytLygyta acgccagggx 


360 


tttcccagtc 


acgacgttgt 


aaaacgacgg 


ccagtgaatt 


cgataaaagt gctttttttg 


420 


ttgcaattga 


agaattatta 


atgttaagct 


taattaaaga 


taatatcttt gaattgtaac 


480 


gcccctcaaa 


agtaagaact 


acaaaaaaag 


aatacgttat 


atagaaatat gtttgaacct 


540 


tcttcagatt 


acaaatatat 


tcggacggac 


tctacctcaa 


atgcttatct aactatagaa 


600 


tgacatacaa 


gcacaacctt 


gaaaatttga 


aaatataact 


accaatgaac ttgttcatgt 


660 


gaattatcgc 


tgtatttaat 


tttctcaatt 


caatatataa 


tatgccaata cattgttaca 


720 


agtagaaatt 


aagacaccct 


tgatagcctt 


actataccta 


acatgatgta gtattaaatg 


780 


aatatgtaaa 


tatatttatg 


ataagaagcg 


acttatttat 


aatcattaca tatttttcta 


840 


ttggaatgat 


taagattcca 


atagaatagt 


gtataaatta 


tttatcttga aaggagggat 


900 


gcctaaaaac 


gaagaacatt 


aaaaacatat 


atttgcaccg 


tctaatggat ttatgaaaaa 


960 


tcattttatc 


agtttgaaaa 


ttatgtatta 


tggagctctg 


aaaaaaagga gaggataaag 


1020 


aatgaagaaa 


ccgttgggga 


aaattgtcgc 


aagcaccgca 


ctactcattt ctgttgcttt 


1080 


tagttcatcg 


atcgcatcgg 


ctgctgaaga 


agcaaaagaa 


aaatatttaa ttggctttaa 


1140 


tgagcaggaa 


gctgtcagtg 


agtttgtaga 


acaagtagag 


gcaaatgacg aggtcgccat 


1200 


tctctctgag 


gaagaggaag 


tcgaaattga 


attgcttcat 


gaatttgaaa cgattcctgt 


1260 


tttatccgtt 


gagttaagcc 


cagaagatgt 


ggacgcgctt 


gaactcgatc cagcgatttc 


1320 


ttatattgaa 


gaggatgcag 


aagtaacgac 


aatggcgcaa 
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aacgcgttaa tccgcggata tatagcggcc gcagatctgg gaccaataat aatgactaga 1440 

gaagaaagaa tgaagattgt tcatgaaatt aaggaacgaa tattggataa agtgggatat 1500 

ttttaaaata tatatttatg ttacagtaat attgactttt aaaaaaggat tgattctaat 1560 

gaagaaagca gacaagtaag cctcctaaat tcactttaga taaaaattta ggaggcatat 1620 

caaatgaact ttaataaaat tgatttagac aattggaaga gaaaagagat atttaatcat 1680 

tatttgaacc aacaaacgac ttttagtata accacagaaa ttgatattag tgttttatac 1740 

* 

cgaaacataa aacaagaagg atataaattt taccctgcat ttattttctt agtgacaagg 1800 

gtgataaact caaatacagc ttttagaact ggttacaata gcgacggaga gttaggttat 1860 

tgggataagt tagagccact ttatacaatt tttgatggtg tatctaaaac attctctggt 1920 

ajttggactc ctgtaaagaa tgacttcaaa gagttttatg atttatacct ttctgatgta 1980 

gagaaatata atggttcggg gaaattgttt cccaaaacac ctatacctga aaatgctttt 2040 

tctctttcua ttattccatg gacttcattt actgggttta acttaaatat caataataat 2100 

agtaattacc ttctacccat tattacagca ggaaaattca ttaataaagg taattcaata 2160 

tatttaccgc tatctttaca ggtacatcat tctgtttgtg atggttatca tgcaggattg 2220 

tttatgaact ctattcagga attgtcagat aggcctaatg actggctttt ataatatgag 2280 

ataatgccga ctgtactttt tacagtcggt tttctaacga tacattaata ggtacgaaaa 2340 

agcaactttt tttgcgctta aaaccagtca taccaataac ttaagggtaa ctagcctcgc 2400 

cggaaagagc gaaaatgcct cacatttgtg ccacctaaaa aggagcgatt tacatatgag 2460 

ttatgcagtt tgtagaatgc aaaaagtgaa atcagctgga ctaaaagggg ccgcagagta 2520 

gaatggaaaa ggggatcgga aaacaagtat ataggaggag acctatttat ggcttcagaa 2580 

aaagacgcag gaaaacagtc agcagtaaag cttgttccat tgcttattac tgtcgctgtg 2640 

ggactaatca tctggtttat tcccgctccg tccggacttg aacctaaagc ttggcatttg 2700 

tttgcgattt ttgtcgcaac aattatcggc tttatctcca agcccttgcc aatgggtgca 2760 

attgcaattt ttgcattggc ggttactgca ctaactggaa cactatcaat tgag'gataca 2820 

ttaagcggat tcgggaataa gaccatttgg cttatcgtta tcgcattctt tatttcccgg 2880 

ggatttatca aaaccggtct cggtgcgaga atttcgtatg tattcgttca gaaattcgga 2940 

aaaaaaaccc ttggactttc ttattcactg ctattcagtg atttaatact ttcacctgct 3000 

attccaagta atacggcgcg tgcaggaggc attatatttc ctattatcag atcattatcc 3060 

gaaacattcg gatcaagccc ggcaaatgga acagagagaa aaatcggtgc attcttatta 3120 

aaaaccggtt ttcaggggaa tctgatcaca tctgctatgt tcctgacagc gatggcggcg 3180 

aacccgctga ttgccaagct ggcccatgat gtcgcagggg tggacttaac atggacaagc 3240 

tgggcaattg ccgcgattgt accgggactt gtaagcttaa tcatcacgcc gcttgtgatt 3300 

tacaaactgt atccgccgga aatcaaagaa acaccggatg cggcgaaaat cgcaacagaa 3360 

aaactgaaag aaatgggacc gttcaaaaaa tcggagcttt ccatggttat cgtgtttctt 3420 
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ttggtgcttg 
ttgatcggtt 
gaacagggcg 
ttgaatgaat 
ttctcttgga 
tttgcaagtg 
gcagcgggcg 
gggtcaacga 
ccgcaaggca 
cjttgtgatcg 
cgcggtctgc 
aaattccctg 
aagcggttcc 
tttgtcattg 
caggcagagc 
gagctcattg 
aaagaacaga 
gcctctggaa 
tcgcatgttt 
ataaaagaac 
acggagacag 
gtactgctgc 
ctcgggcttg 
gcgattcgag 
tcggcggccg 
atgccgggag 
gtaagcgtca 
gcgtatggga 
ttgacagagg 
aagctttatg 
ggtcatagct 
ccggaagcat 
cgttgcgctc 
tcggccaacg 
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tgctgtggat ttttggcggc agcttcaaca tcgacgctac cacaaccgca 3480 

tggccgttct cttattatca caagttctga cttgggatga tatcaagaaa 3540 

cttgggatac gctcacttgg tttgcggcgc ttgtcatgct cgccaacttc 3600 

taggcatggt gtcttggttc agtaatgcca tgaaatcatc cgtatcaggg 3660 

ttgtggcatt catcatttta attgttgtgt attattactc tcactatttc 3720 

cgacagccca catcagtgcg atgtattcag catttttggc tgtcgtcgtg 3780 

caccgccgct tttagcagcg ctgagcctcg cgttcatcag caacctgttc 3840 

ctcactacgg ttctggagcg gctccggtct tcttcggagc a'ggctacatc 3900 

aatggtggtc catcggattt atcctgtcga ttgttcatat catcgtatgg 3960 

gcggattatg gtggaaagta ctaggaatat ggtagaaaga aaaaggcaga 4020 

ctttttttat tttcactcct tcgtaagaaa atggattttg aaaaatgaga 4080 

tgaaaaatgg tatgatctag gtagaaagga cggctggtgc tgtggtgaaa 4140 

atttttccct gcaaacaaaa ataatggggc tgattgcggc tctgctggtc 4200 

gtgtgctgac cattacgtta gccgttcagc atacacaggg agaacggaga 4260 

agctggcggt tcaaacggcg agaaccattt cctatatgcc gccggttaaa 4320 

agagaaaaga cggacatgcg gctcagacgc aagaggtcat tgaacaaatg 4380 

ctggtgcgtt tgccatttat gttttgaacg aaaaaggaga cattcgcagc 4440 

aaagcggatt aaagaaactg gagcgcagca gagaaatttt gtttggcggt 4500 

ctgaaacaaa agcggatgga cgaagagtga tcagagggag cgcgccgatt 4560 

agaagggata cagccaagtg atcggcagcg tgtctgttga ttttctgcaa 4620 

agcaaagcat caaaaagcat ttgagaaatt tgagtgtgat tgctgtgctt 4680 

tcggatttat tggcgccgcc gtgctggcga aaagcatcag aaaggatacg 4740 

aaccgcatga gatcgcggct ctatatcgtg agaggaacgc aatgcttttc 4800 

aagggattat tgccaccaat cgtgaaggcg tcgtcaccat gatgaacgta 4860 

agatgctgaa gctgcccgag cctgtgatcc atcttcctat agatgacgtc 4920 

cagggctgat gtctgtgctt gaaaaaggag aaatgctgcc gaaccaggaa 4980 

acgatcaagt gtttattatc aatacgaaag tgatgaatca aggcgggcag 5040 

ttgtcgtcag cttcagggag aaaacagagc tgaagaagct gatcgacaca 5100 

ttcgcaaata ttcagaggat ctcagggcgc agactcatga attttcaaat 5160 

cgattttagg gctgcgtcga cctgcaggca tgcaagcttg gcgtaatcat 5220 

gtttcctgtg tgaaattgtt atccgctcac aattccacac aacatacgag 5280 

aaagtgtaaa gcctggggtg cctaatgagt gagctaactc acattaattg 5340 

actgcccgct ttccagtcgg gaaacctgtc gtgccagctg cattaatgaa 5400 

cgcggggaga ggcggtttgc gtattgggcg ctcttccgct tcctcgctca 5460 
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ctgactcgct gcgctcggtc gttcggctgc ggcgagcggt atcagctcac tcaaaggcgg 5520 

taatacggtt atccacagaa tcaggggata acgcaggaaa gaacatgtga gcaaaaggcc 5580 

agcaaaaggc caggaaccgt aaaaaggccg cgttgctggc gtttttccat aggctccgcc 5640 

cccctgacga gcatcacaaa aatcgacgct caagtcagag gtggcgaaac ccgacaggac 5700 

tataaagata ccaggcgttt ccccctggaa gctccctcgt gcgctctcct gttccgaccc 5760 

tgccgcttac cggatacctg tccgcctttc tcccttcggg aagcgtggcg ctttctcata 5820 

gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg ggctgtgtgc 5880 

acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt cttgagtcca 5940 

acccggtaag acacgactta tcgccactgg cagcagccac tggtaacagg attagcagag 6000 

<gaggtatgt aggcggtgct acagagttct tgaagtggtg gcctaactac ggctacacta 6060 

gaaggacagt atttggtatc tgcgctctgc tgaagccagt taccttcgga aaaagagttg 6120 

gtagctcttg atccggcaaa caaaccaccg ctggtagcgg tggttttttt gtttgcaagc 6180 

agcagattac gcgcagaaaa aaaggatctc aagaagatcc tttgatcttt tctacggggt 6240 

ctgacgctca gtggaacgaa aactcacgtt aagggatttt ggtcatgaga ttatcaaaaa 6300 

ggatcttcac ctagatcctt ttaaattaaa aatgaagttt taaatcaatc taaagtatat 6360 

atgagtaaac ttggtctgac agttaccaat gcttaatcag tgaggcacct atctcagcga 6420 

tctgtctatt tcgttcatcc atagttgcct gactccccgt cgtgtagata actacgatac 6480 

gggagggctt accatctggc cccagtgctg caatgatacc gcgagaccca cgctcaccgg 6540 

caccggattt atcagcaata aaccagccag ccggaagggc cgagcgcaga agtggtcctg 6600 

caactttatc cgcctccatc cagtctatta attgttgccg ggaagctaga gtaagtagtt 6660 

cgccagttaa tagtttgcgc aacgttgttg ccattgctac aggcatcgtg gtgtcacgct 6720 

cgtcgtttgg tatggcttca ttcagctccg gttcccaacg atcaaggcga gttacatgat 6780 

cccccatgtt gtgcaaaaaa gcggttagct ccttcggtcc tccgatcgtt gtcagaagta 6840 

agttggccgc agtgttatca ctcatggtta tggcagcact gcataattct cttactgtca 6900 

tgccatccgt aagatgcttt tctgtgactg gtgagtactc aaccaagtca ttctgagaat 6960 

agtgtatgcg gcgaccgagt tgctcttgcc cggcgtcaat acgggataat accgcgccac 7020 

atagcagaac tttaaaagtg ctcatcattg gaaaacgttc ttcggggcga aaactctcaa 7080 

ggatcttacc gctgttgaga tccagttcga tgtaacccac tcgtgcaccc aactgatctt 7140 

cagcatcttt tactttcacc agcgtttctg ggtgagcaaa aacaggaagg caaaatgccg 7200 

caaaaaaggg aataagggcg acacggaaat gttgaatact catactcttc ctttttcaat 7260 

attattgaag catttatcag ggttattgtc tcatgagcgg atacatattt gaatgtattt 7320 

agaaaaataa acaaataggg gttccgcgca catttccccg aaaagtgcca cctgacgtct 7380 

aagaaaccat tattatcatg acattaacct ataaaaatag gcgtatcacg aggccctttc 7440 

9 tc 7443 

Page 9 



10495 .020-DK. ST25 . txt 



<210> 22 
<211> 5718 
<212> DNA 

<213> Artificial sequence 
<220> 

<223> Sequence of MB1510 genomic integration region 
<400> 22 

gagcgccgtt tggctgaatg atacaacagt ctcacttcct tactgcgtct ggttgcaaaa 60 

acgaagaagc aaggattccc ctcgcttctc atttgtccta tttattatac acttttttaa 120 

gcacatcttt ggcgcttgtt tcactagact tgatgcctct gaatcttgtc caagtgtcac 180 

ggtccgcatc atagacttgt ccatttttca ccgctttgag atttttccag agcgggttcg 240 

ttttccactc atctacaatg gttttgcctt cgttggctga gatgaacaaa atatcaggat 300 

) cgattttgct caattgctca aggctgacct cttgataggc gttatctgac ttcacagcgt 360 

gtgtaaagcc tagcatttta aagatttctc cgtcatagga tgatgatgta tgaagctgga 420 

aggaatccgc tcttgcaacg ccgagaacga tgttgcggtt ttcatctttc ggaagttcgg 480 

cttttagatc gttgatgact tttttgtgct cggcaagctt ttcttttcct tcatcttctt 540 

tatttaatgc tttagcaatg gtcgtaaagc tgtcgatcgt ttcgtcatat gtcgcttcac 600 

ggctttttaa ttcaatcgtc ggggcgattt ttttcagctg tttataaatg tttttatggc 660 

gctcagcgtc agcgatgatt aaatcaggct tcaaggaact gatgacctca agattgggtt 720 & 

cgctgcgtgt gcctacagat gtgtaatcaa tggagctgcc gacaagcttt ttaatcatat 780 

cttttttgtt gtcatctgcg atgcccaccg gcgtaatgcc gagattgtga acggcatcca 840 

agaatgaaag ctcaagcaca accacccgct taggtgtgcc gcttactgtc gtttttcctt 900 

cttcgtcatg gatcactctg gaatccttag actcgctttt gccgcttccg ttgttattct 960 

ggcttgatga acagccggat acaatgaggc aggcgagcaa taaaacactc atgatggcaa 1020 

9 tcaacttgtt agaataggtg cgcatgtcat tcttcctttt ttcagattta gtaatgagaa 1080 

tcattatcac atgtaacact ataatagcat ggcttatcat gtcaatattt ttttagtaaa 1140 

gaaagctgcg tttttactgc tttctcatga aagcatcatc agacacaaat aagtggtatg 1200 

cagcgttacc gtgtcttcga gacaaaaacg catgggcgtt ggctttagag gtttcgaaca 1260 

tatcagcagt gacataagga aggagagtgc tgagataacc ggacaatttc ttttctattt 1320 

catctgttag tgcaaattca atgtcgccga tattcatgat aatcgagaaa acaaagtcga 1380 

tatcgatatg aaaatgttcc tcggcaaaaa ccgcaagctc gtgaattcct ggtgaacatc 1440 

cggcacgctt atggaaaatc tgtttgacta aatcactcac aatccaagca ttgtattgct 1500 

gttctggtga aaagtattgc attagacata cctcctgctc gtacggataa aggcagcgtt 1560 

tcatggtcgt gtgctccgtg cagcggcttc tccttaattt tgatttttct gaaaataggt 1620 

cccgttccta tcactttacc atggacggaa aacaaatagc tactaccatt cctcctgttt 1680 

ttctcttcaa tgttctggaa tctgtttcag gtacagacga tcgggtatga aagaaatata 1740 
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gaaaacatga aggaggaata tcgacatgaa accagttgta aaagagtata caaatgacga 1800 

acagctcatg aaagatgtag aggaattgca gaaaatgggt gttgcgaaag aggatgtata 1860 

cgtcttagct cacgacgatg acagaacgga acgcctggct gacaacacga acgccaacac 1920 

gatcggagcc aaagaaacag gtttcaagca cgcggtggga aatatcttca ataaaaaagg 1980 

agacgagctc cgcaataaaa ttcacgaaat cggtttttct gaagatgaag ccgctcaatt 2040 

tgaaaaacgc ttagatgaag gaaaagtgct tctctttgtg acagataacg aaaaagtgaa 2100 

agcttgggca taaagcaagg aaaaaaccaa aaggccaatg tcggcctttt ggtttttttg 2160 

cggtctttgc ggtgggattt tgcagaatgc cgcaatagga tagcggaaca ttttcggttc 2220 

tgaatgtccc tcaatttgct attatatttt tgtgataaat tggaataaaa tctcacaaaa 2280 

t;agaaaatgg gggtacatag tggatgaaaa aagtgatgtt agctacggct ttgtttttag 2340 

gattgactcc agctggcgcg aacgcagctg atttaggcca ccagacgttg ggatccaatg 2400 

atggctgggg cgcgtactcg accggcacga caggcggatc aaaagcatcc tcctcaaatg 2460 

tgtataccgt cagcaacaga aaccagcttg tctcggcatt agggaaggaa acgaacacaa 2520 

cgccaaaaat catttatatc aagggaacga ttgacatgaa cgtggatgac aatctgaagc 2580 

cgcttggcct aaatgactat aaagatccgg agtatgattt ggacaaatat ttgaaagcct 2640 

atgatcctag cacatggggc aaaaaagagc cgtcgggaac acaagaagaa gcgagagcac 2700 

gctctcagaa aaaccaaaaa gcacgggtca tggtggatat ccctgcaaac acgacgatcg 2760 

tcggttcagg gactaacgct aaagtcgtgg gaggaaactt ccaaatcaag agtgataacg 2820 

tcattattcg caacattgaa ttccaggatg cctatgacta ttttccgcaa tggttgtaaa 2880 

acgacggcca gtgaattctg atcaaatggt tcagtgagag cgaagcgaac acttgatttt 2940 

ttaattttct atcttttata ggtcattaga gtatacttat ttgtcctata aactatttag 3000 

cagcataata gatttattga ataggtcatt taagttgagc atattagagg aggaaaatct 3060 

tggagaaata, tttgaagaac ccgagatcta gatcaggtac cgcaacgttc gcagatgctg 3120 

ctgaagagat tattaaaaag ctgaaagcaa aaggctatca attggtaact gtatctcagc 3180 

ttgaagaagt gaagaagcag agaggctatt gaataaatga gtagaaagcg ccatatcggc 3240 

gcttttcttt tggaagaaaa tatagggaaa atggtacttg ttaaaaattc ggaatattta 3300 

tacaatatca tatgtatcac attgaaagga ggggcctgct gtccagactg tccgctgtgt 3360 

aaaaataagg aataaagggg ggttgacatt attttactga tatgtataat ataatttgta 3420 

taagaaaatg gaggggccct cgaaacgtaa gatgaaacct tagataaaag tgcttttttt 3480 

gttgcaattg aagaattatt aatgttaagc ttaattaaag ataatatctt tgaattgtaa 3540 

cgcccctcaa aagtaagaac tacaaaaaaa gaatacgtta tatagaaata tgtttgaacc 3600 

ttcttcagat tacaaatata ttcggacgga ctctacctca aatgcttatc taactataga 3660 

atgacataca agcacaacct tgaaaatttg aaaatataac taccaatgaa cttgttcatg 3720 

tgaattatcg ctgtatttaa ttttctcaat tcaatatata atatgccaat acattgttac 3780 
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aagtagaaat taagacaccc ttgatagcct tactatacct aacatgatgt agtattaaat 3840 

gaatatgtaa atatatttat gataagaagc gacttattta taatcattac atatttttct 3900 

attggaatga ttaagattcc aatagaatag tgtataaatt atttatcttg aaaggaggga 3960 

tgcctaaaaa cgaagaacat taaaaacata tatttgcacc gtctaatgga tttatgaaaa 4020 

atcattttat cagtttgaaa attatgtatt atggagctct gaaaaaaagg agaggataaa 4080 

gagaaaaggg gatcggaaaa caagtatata ggaggagacc tatttatggc ttcagaaaaa 4140 

gacgcaggaa aacagtcagc agtaaagctt gttccattgc ttattactgt cgctgtggga 4200 

ctaatcatct ggtttattcc cgctccgtcc ggacttgaac ctaaagcttg gcatttgttt 4260 

gcgatttttg tcgcaacaat tatcggcttt atctccaagc ccttgccaat gggtgcaatt 4320 

gcaatttttg cattggcggt tactgcacta actggaacac tatcaattga ggatacatta 4380 

agcggattcg ggaataagac catttggctt atcgttatcg cattctttat ttcccgggga 4440 

tttatcaaaa ccggtctcgg tgcgagaatt tcgtatgtat tcgttcagaa attcggaaaa 4500 

aaaacccttg gactttctta ttcactgcta ttcagtgatt taatactttc acctgctatt 4560 

ccaagtaata cggcgcgtgc aggaggcatt atatttccta ttatcagatc attatccgaa 4620 

acattcggat caagcccggc aaatggaaca gagagaaaaa tcggtgcatt cttattaaaa 4680 

accggttttc aggggaatct gatcacatct gctatgttcc tgacagcgat ggcggcgaac 4740 

ccgctgattg ccaagctggc ccatgatgtc gcaggggtgg acttaacatg gacaagctgg 4800 

gcaattgccg cgattgtacc gggacttgta agcttaatca tcacgccgct tgtgatttac 4860 

aaactgtatc cgccggaaat caaagaaaca ccggatgcgg cgaaaatcgc aacagaaaaa 4920 

ctgaaagaaa tgggaccgtt caaaaaatcg gagctttcca tggttatcgt gtttcttttg 4980 

gtgcttgtgc tgtggatttt tggcggcagc ttcaacatcg acgctaccac aaccgcattg 5040 

atcggtttgg ccgttctctt attatcacaa gttctgactt gggatgatat caagaaagaa 5100 

cagggcgctt gggatacgct cacttggttt gcggcgcttg tcatgctcgc caacttcttg 5160 

aatgaattag gcatggtgtc ttggttcagt aatgccatga aatcatccgt atcagggttc 5220 

tcttggattg tggcattcat cattttaatt gttgtgtatt attactctca ctatttcttt 5280 

gcaagtgcga cagcccacat cagtgcgatg tattcagcat ttttggctgt cgtcgtggca 5340 

gcgggcgcac cgccgctttt agcagcgctg agcctcgcgt tcatcagcaa cctgttcggg 5400 

tcaacgactc actacggttc tggagcggct ccggtcttct tcggagcagg ctacatcccg 5460 

caaggcaaat ggtggtccat cggatttatc ctgtcgattg ttcatatcat cgtatggctt 5520 

gtgatcggcg gattatggtg gaaagtacta ggaatatggt agaaagaaaa aggcagacgc 5580 

ggtctgcctt tttttatttt cactccttcg taagaaaatg gattttgaaa aatgagaaaa 5640 

ttccctgtga aaaatggtat gatctaggta gaaaggacgg ctggtgctgt ggtgaaaaag 5700 

cggttccatt tttccctg 5718 
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<211> 27 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1605 

<400> 23 

gacggccagt gaattcgata aaagtgc 

<210> 24 

<211> 42 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1606 
<220> 

<221> misc_feature 

<222> (13)..C13) 

<223> n is a, c, g, or t 

<220> 

<221> misc_feature 

<222> C16)..C16) 

<223> n is a, c, g, or t 

<400> 24 

ccagatctct atnktnktgt acggagtcta actccccaag ag 



27 



42 



<210> 25 
<211> 1U2 
<212> DNA 

<213> Nocardiopsis .dassonvillei DSM 43235 
<400> 25 

gcttttagtt catcgatcgc atcggctgct ccggcccccg tcccccagac ccccgtcgcc 60 

gacgacagcg ccgccagcat gaccgaggcg ctcaagcgcg acctcgacct cacctcggcc 120 

gaggccgagg agcttctctc ggcgcaggaa gccgccatcg agaccgacgc cgaggccacc 180 

gaggccgcgg gcgaggccta cggcggctca ctgttcgaca ccgagaccct cgaactcacc 240 

gtgctggtca ccgacgcctc cgccgtcgag gcggtcgagg ccaccggagc ccaggccacc 300 

gtcgtctccc acggcaccga gggcctgacc gaggtcgtgg aggacctcaa cggcgccgag 360 

gttcccgaga gcgtcctcgg ctggtacccg gacgtggaga gcgacaccgt cgtggtcgag 420 

gtgctggagg gctccgacgc cgacgtcgcc gccctgctcg ccgacgccgg tgtggactcc 480 

tcctcggtcc gggtggagga ggccgaggag gccccgcagg tctacgccga catcatcggc 540 

ggcctggcct actacatggg cggccgctgc tccgtcggct tcgccgcgac caacagcgcc 600 

ggtcagcccg gtttcgtcac cgccggccac tgcggcaccg tcggcaccgg cgtgaccatc 660 

ggcaacggca ccggcacctt ccagaactcg gtcttccccg gcaacgacgc cgccttcgtc 720 

cgcggcacct ccaacttcac cctgaccaac ctggtctcgc gctacaactc cggcggctac 780 

cagtcggtga ccggtaccag ccaggccccg gccggctcgg ccgtgtgccg ctccggctcc 840 
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10495 • 020-DK . ST2 5 . txt 

accaccggct ggcactgcgg caccatccag gcccgcaacc agaccgtgcg ctacccgcag 900 

ggcaccgtct actcgctcac ccgcaccaac gtgtgcgccg agcccggcga ctccggcggt 960 

tcgttcatct ccggctcgca ggcccagggc gtcacctccg gcggctccgg caactgctcc 1020 

gtcggcggca cgacctacta ccaggaggtc accccgatga tcaactcctg gggtgtcagg 1080 

atccggacct aatcgcatgt tcaatccgct cc 1112 

<210> 26 

<211> 48 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1423 

<400> 26 

gcttttagtt catcgatcgc atcggctgct ccggcccccg tcccccag 48 

<210> 27 

<211> 45 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1475 

<400> 27 

ggagcggatt gaacatgcga ttaggtccgg atcctgacac cccag 45 

<210> 28 

<211> 354 

<212> PRT 

<213> Nocardiopsis dassonvillei DSM 43235 
<220> 

<221> PROPEP 

<222> (1) . . C166) 

<220> 

<221> mat_peptide 
<222> (167)., (354) 

<400> 28 

Ala p £2- Ala pro val Pro 61 n Thr Pro val Ala Asp Asp Ser Ala 
-165 -160 -155 

Ala S ?C Met Thr Glu Ala Leu L y s Ar 9 Asp Leu Asp Leu Thr Ser 
-150 -145 -140 

Ala Glu Ala Glu Glu Leu Leu ser Ala Gin Glu Ala Ala He Glu 
-135 -130 -125 

Thr Asp Ala Glu Ala Thr Glu Ala Ala Gly Glu Ala Tyr Gly Gly 
-120 -115 -110 
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10495 . 020-DK . ST25 . txt 
ser Leu Phe Asp Thr Glu Thr Leu Glu Leu Thr val Leu Val Thr asd 
-105 -100 -95 

A l5 ser Ala val Glu Ala va "l Glu Ala Thr Gly Ala Gin Ala Thr val 
-90 -85 -80 -75 

val Ser His Gly Thr Glu Gly Leu Thr Glu val Val Glu Asp Leu Asn 
-70 -65 -60 

Gly Ala Glu va.l Pro Glu Ser Val Leu Gly Trp Tyr Pro Asp val Glu 
-55 -50 -45 

Ser Asp Thr val val val Glu val Leu Glu Gly ser Asp Ala Asp val 
-40 -35 -30 

Ala Ala Leu Leu Ala Asp Ala Gly val Asp Ser Ser Ser val Arg val 
-25 -20 -15 

Glu Glu Ala Glu Glu Ala Pro Gin val Tyr Ala Asp He He Gly Gly 
-10 -5 -li 5 

Leu Ala Tyr Tyr wet Gly Gly Arg cys ser val Gly Phe Ala Ala Thr 
10 15 20 

Asn ser Ala Gly Gin Pro Gly Phe Val Thr Ala Gly His cys Gly Thr 
25 30 35 

val 9l y Thr Gl y val Tnr Ile Gl V Asn Gl Y Thr Gly Thr Phe Gin Asn 
40 45 50 

Ser val Phe Pro Gly Asn Asp Ala Ala Phe val Arg Gly Thr ser Asn 
55 60 65 70 

Phe Thr Leu Thr Asn Leu val Ser Arg Tyr Asn ser Gly Gly Tyr Gin 
75 80 85 

Ser val Thr Gly Thr ser Gin Ala Pro Ala Gly Ser Ala val Cys Arg 
90 95 100 

ser Gly Ser Thr Thr Gly Trp His cys Gly Thr lie Gin Ala Arg Asn 
105 110 115 

Gin Thr val Arg Tyr Pro Gin Gly Thr val Tyr ser Leu Thr Arg Thr 
120 125 130 

Asn val Cys Ala Glu pro Gly Asp Ser Gly Gly Ser Phe lie Ser Gly 
135 140 145 150 

Ser Gin Ala Gin Gly val Thr Ser Gly Gly Ser Gly Asn cys Ser val 
155 160 165 
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/.i t , 10495.020-DK.ST25.txt 

Gly Gly Thr Thr Tyr Tyr Gin Glu Val Thr Pro Met lie Asn Ser Tro 
170 175 180 

Gly val Arg He Arg Thr 
185 

<210> 29 
<211> 498 
<212> DNA 

<213> Nocardiopsis dassonvillei dsm 43235 
<400> 29 

gctccggccc ccgtccccca gacccccgtc gccgacgaca gcgccgccag catgaccgag 60 
gcgctcaagc gcgacctcga cctcacctcg gccgaggccg aggagcttct ctcggcgcag 120 
gaagccgcca tcgagaccga cgccgaggcc accgaggccg cgggcgaggc ctacggcggc 180 
tcactgttcg acaccgagac cctcgaactc accgtgctgg tcaccgacgc ctccgccgtc 240 
gaggcggtcg aggccaccgg agcccaggcc accgtcgtct cccacggcac cgagggcctg 300 
accgaggtcg tggaggacct caacggcgcc gaggttcccg agagcgtcct cggctggtac 360 
ccggacgtgg agagcgacac cgtcgtggtc gaggtgctgg agggctccga cgccgacgtc 420 
gccgccctgc tcgccgacgc cggtgtggac tcctcctcgg tccgggtgga ggaggccgag 480 
gaggccccgc aggtctac 498 

<210> 30 _ 

<211> 166 *■ 

<212> PRT 

<213> Nocardiopsis dassonvillei OSM 43235 

<400> 30 

Ala Pro Ala Pro val Pro Gin Thr Pro val Ala Asp Asp Ser Ala Ala 
A 5 10 15 

ser Met Thr Glu Ala Leu Lys Arg Asp Leu Asp Leu Thr Ser Ala Glu 
20 25 30 

Ala Glu Glu Leu Leu Ser Ala Gin Glu Ala Ala He Glu Thr Asp Ala 
35 40 45 

Glu Ala Thr Glu Ala Ala Gly Glu Ala Tyr Gly Gly Ser Leu Phe Asp 

Thr Glu Thr Leu Glu Leu Thr val Leu val Thr Asp Ala ser Ala val 
° 3 7t > 75 80 

Glu Ala Val Glu Ala Thr Gly Ala Gin Ala Thr Val val ser His Gly 
85 90 95 

Thr Glu Gly Leu Thr Glu Val val Glu Asp Leu Asn Gly Ala Glu val 
100 105 110 
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„, . „ , . 10495.020-DK.ST25.txt 
Pro Glu Ser Val Leu Gly Trp Tyr pro Asp val Glu Ser Asp Thr val 
115 120 125 

val val Glu Val Leu Glu Gly ser Asp Ala Asp Val Ala Ala Leu Leu 
130 135 140 

Ala Asp Ala Gly val Asp ser ser ser val Arg val Glu Glu Ala Glu 
145 150 155 160 

Glu Ala Pro Glp val Tyr 
165 

<210> 31 
<211> 1146 
<212> DNA 

«S213> Artificial sequence 
<220> 

<223> The DNA sequence coding for the pro- region of SEQ id NO: 29 fused 
cSn«fSct?°lJ^roAl§18S? e taiUariant encodin 9 whole 



<400> 31 














atgaagaaac 


cgttggggaa 


aattgtcgca 


agcaccgcac 


tactcatttc 


tgttgctttt 


60 


agttcatcga 


tcgcatcggc 


tgctccggcc 


cccgtccccc 


agacccccgt 


cgccgacgac 


120 


agcgccgcca 


gcatgaccga 


ggcgctcaag 


cgcgacctcg 


acctcacctc 


ggccgaggcc 


180 


gaggagcttc 


tctcggcgca 


ggaagccgcc 


atcgagaccg 


acgccgaggc 


caccgaggcc 


240 


gcgggcgagg 


cctacggcgg 


ctcactgttc 


gacaccgaga 


ccctcgaact 


caccgtgctg 


300 


gtcaccgacg 


cctccgccgt 


cgaggcggtc 


gaggccaccg 


gagcccaggc 


caccgtcgtc 


360 


tcccacggca 


ccgagggcct 


gaccgaggtc 


gtggaggacc 


tcaacggcgc 


cgaggttccc 


420 


gagagcgtcc 


tcggctggta 


cccggacgtg 


gagagcgaca 


ccgtcgtggt 


cgaggtgctg 


480 


gagggctccg 


acgccgacgt 


cgccgccctg 


ctcgccgacg 


ccggtgtgga 


ctcctcctcg 


540 


gtccgggtgg 


aggaggccga 


ggaggccccg 


caggtctatg 


ccgatatcat 


tggaggccta 


600 


gcgtacacaa 


tgggtggtcg 


ctgcagcgta 


ggatttgcag 


ccacaaatgc 


agctggacaa 


660 


cctggcttcg 


tgacagctgg 


acattgcggc 


cgcgtcggta 


cacaggttac 


tatcggcaat 


720 


ggaagaggtg 


tctttgagca 


aagcgtattt 


cccgggaatg 


atgctgcctt 


cgttagaggt 


780 


acgtccaact 


ttacgcttac 


taacttagta 


tctagataca 


acactggcgg 


atatgcaact 


840 


gtagcaggtc 


acaatcaagc 


acctattggc 


tctagcgtct 


gccgctcagg 


gtcgactaca 


900 


ggatggcatt 


gtggaaccat 


tcaagctaga 


ggtcagagcg 


tgagctatcc 


tgaaggtacc 


960 


gtaacgaaca 


tgactcgtac 


gactgtatgt 


gcagaaccag 


gtgactctgg 


aggttcatat 


1020 


atcagcggta 


cgcaagcgca 


aggcgttacc 


tcaggtggat 


ccggtaactg 


taggacaggt 


1080 


ggcacaacgt 


tctaccagga 


agtgacaccg 


atggtgaact 


cttggggagt 


tagactccgt 


1140 


acataa 












1146 
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<210> 32 10495 .020-DK . ST25 . txt 

<211> 1068 

<212> ONA 

<213> Nocardiopsis Alba DSM 15647 

<400> 32 



gcgaccggcc 


ccctccccca 


gtcccccacc 


ccggatgaag 


ccgaggccac caccatggtc 


60 


gaggccctcc 


agcgcgacct 


cggcctgtcc 


ccctctcagg 


ccgacgagct cctcgaggcg 


120 


caggccgagt 


ccttcgagat 


cgacgaggcc 


gccaccgcgg 


ccgcagccga ctcctacggc 


180 


ggctccatct 


tcgacaccga 


cagcctcacc 


ctgaccgtcc 


tggtcaccga cgcctccgcc 


240 


gtcgaggcgg 


tcgaggccgc 


cggcgccgag 


gccaaggtgg 


tctcgcacgg catggagggc 


300 


ctggaggaga 


tcgtcgccga 


cctgaacgcg 


gccgacgctc 


agcccggcgt cgtgggctgg 


360 


taccccgaca 


tccactccga 


cacggtcgtc 


ctcgaggtcc 


tcgagggctc cggtgccgac 


420 


gtggactccc 


tgctcgccga 


cgccggtgtg 


gacaccgccg 


acgtcaaggt ggagagcacc 


480 


accgagcagc 


ccgagctgta 


cgccgacatc 


atcggcggtc 


tcgcctacac catgggtggg 


540 


cgctgctcgg 


tcggcttcgc 


ggccaccaac 


gcctccggcc 


agcccgggtt cgtcaccgcc 


600 


ggccactgcg 


gcaccgtcgg 


caccccggtc 


agcatcggca 


acggccaggg cgtcttcgag 


660 


cgttccgtct 


tccccggcaa 


cgactccgcc 


ttcgtccgcg 


gcacctcgaa cttcaccctg 


720 


accaacctgg 


tcagccgcta 


caacaccggt 


ggttacgcga 


ccgtctccgg ctcctcgcag 


780 


gcggcgatcg 


gctcgcagat 


ctgccgttcc 


ggctccacca 


ccggctggca ctgcggcacc 


840 


gtccaggccc 


gcggccagac 


ggtgagctac 


ccccagggca 


ccgtgcagaa cctgacccgc 


900 


accaacgtct 


gcgccgagcc 


cggtgactcc 


ggcggctcct 


tcatctccgg cagccaggcc 


960 


cagggcgtca 


cctccggtgg 


ctccggcaac 


tgctccttcg 


gtggcaccac ctactaccag 


1020 


gaggtcaacc 


cgatgctgag 


cagctggggt 


ctgaccctgc 


gcacctga 


1068 



<210> 33 

<211> 355 

<212> prt 

<213> Nocardiopsis Alba DSM 15647 
<220> 

<221> PROPEP 

<222> (1) . . C167) 

<220> 

<221> mat_peptide 
<222> (168).. (3 55) 

<400> 33 

Ala Thr Gly Pro Leu Pro Gin Ser pro Thr Pro Asp Glu Ala Glu 
_ib: > -160 -155 

Ala Thr Thr Met Val Glu Ala Leu Gin Arg Asp Leu Gly Leu ser 
J-^u -145 _ 140 

Pro ser Gin Ala Asp Glu Leu Leu Glu Ala Gin Ala Glu Ser Phe 
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-_ c 10495.020-DK.ST25.txt 
" 13 5 -130 -125 

Glu lie Asp Glu Ala Ala Thr Ala Ala Ala Ala Asp Ser Tyr Glv 

-115 -HO ' 

Gly ser ile^ Phe Asp Thr Asp Ser^ Leu Thr Leu Thr Vgl Leu val Thr 

Asp Ala ser Ala val Glu Al| val Glu Ala Ala Gly^ Ala Glu Ala Lys 

val val ser His Gly Met Glu Gly Leu Glu Glu He val Ala Asp Leu 

* 70 -65 _60 

Asn Ala Ala Asp Ala Gin Pro Gly val val Gly Trp Tyr Pro Asp lie 

-50 "45 

His ser Asp Thr val val Leu Glu val Leu Glu Gly ser Gly Ala asd 
~* u -35 -30 

Val Asp ser Leu Leu Ala Asp Ala Gly Val Asp Thr Ala Asp Val Lys 

val Glu ser Thr Thr Glu Gin Pro Glu Leu T^r Ala Asp lie He Gly 

Gly Leu Ala Tyr Thr Met Gly Gly Arg cys ser val Gly Phe Ala Ala 
10 15 20 

Thr Asn Ala ser Gly Gin Pro Gly Phe Val Thr Ala Gly His cys Gly 
o 30 35 

Thr val Gly Thr Pro val ser lie Gly Asn Gly Gin Gly val Phe Glu 
* u 45 50 

Arg ser val Phe Pro Gly Asn Asp ser Ala Phe Val Arg Gly Thr Ser 

ou 55 

Asn Phe Thr Leu Thr Asn Leu val Ser Arg Tyr Asn Thr Gly Gly Tyr 

75 80 85 

Ala Thr val Ser Gly ser Ser Gin Ala Ala He Gly ser Gin He Cys 
yu 95 100 

Arg ser Gly Ser Thr Thr Gly Trp His Cys Gly Thr val Gin Ala Arg 

i-ui no 115 

Gly Gin Thr Val ser Tyr Pro Gin Gly Thr val Gin Asn Leu Thr Arg 
±£U 125 130 s 

Thr Asn val Cys Ala Glu Pro Gly Asp ser Gly Gly ser Phe He Ser 
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-„ HM 10495.020-DK.ST25.txt 

"5 140 X45 

Gig ser Gin Ala Gin Gl^ val Thr Ser Gly Gig Ser Gly Asn cys ser 

Phe Gly Gly Thr Thr Tyr Tyr Gin Glu val Asn Pro Met Leu Ser ser 
170 175 180 

Trp Gly Leu Thr Leu Arg Thr 
18,5 

<210> 34 
<211> 43 
<212> ONA 

<213> Artificial sequence 
<220> 

<223> primer 1421 
<400> 34 

gttcatcgat cgcatcggct gcgaccggcc ccctccccca gtc 43 

<210> 35 

<211> 31 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> primer 1604 

<400> 35 

gcggatccta tcaggtgcgc agggtcagac c 3! 

<210> 36 
<211> 1062 
<212> DNA 

<213> Nocardiopsis prasina DSM 15648 
<400> 36 

gccaccggac cgctccccca gtcacccacc ccggaggccg acgccgtctc catgcaggag 60 

gcgctccagc gcgacctcgg cctgaccccg cttgaggccg atgaactgct ggccgcccag 120 

gacaccgcct tcgaggtcga cgaggccgcg gccgcggccg ccggggacgc ctacggcggc 180 

tccgtcttcg acaccgagac cctggaactg accgtcctgg tcaccgacgc cgcctcggtc 240 

gaggctgtgg aggccaccgg cgcgggtacc gaactcgtct cctacggcat cgagggcctc 300 

gacgagatca tccaggatct caacgccgcc gacgccgtcc ccggcgtggt cggctggtac 360 

ccggacgtgg cgggtgacac cgtcgtcctg gaggtcctgg agggttccgg agccgacgtg 420 

agcggcctgc tcgccgacgc cggcgtggac gcctcggccg tcgaggtgac cagcagtgcg 480 

cagcccgagc tctacgccga catcatcggc ggtctggcct acaccatggg cggccgctgt 540 

tcggtcggat tcgcggccac caacgccgcc ggtcagcccg gattcgtcac cgccggtcac 600 

tgtggccgcg tgggcaccca ggtgagcatc ggcaacggcc agggcgtctt cgagcagtcc 660 

atcttcccgg gcaacgacgc cgccttcgtc cgcggcacgt ccaacttcac gctgaccaac 720 
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ctggtcagcc gctacaacac cggcggttac gccaccgtcg ccggccacaa ccaggcgccc 780 

atcggctcct ccgtctgccg ctccggctcc accaccggct ggcactgcgg caccatccag 840 

gcccgcggcc agtcggtgag ctaccccgag ggcaccgtca ccaacatgac ccggaccacc 900 

gtgtgcgccg agcccggcga ctccggcggc tcctacatct ccggcaacca ggcccagggc 960 

gtcacctccg gcggctccgg caactgccgc accggcggga ccaccttcta ccaggaggtc 1020 

acccccatgg tgaactcctg gggcgtccgt ctccggacct aa 10 62 

» 

<210> 37 

<211> 353 

<212> PRT 

<213> Nocardiopsis prasina DSM 15648 
<220> 

<221> PROPEP 

<222> (1) . . (165) 

<220> 

<22l> mat_peptide 
<222> (166)..C353) 

<400> 37 

-165 Gly Pr ° L6U !i60 G1 ° Pr ° Thr P i° 5 Glu Ala ASp Ala 

val ser Met Gin Glu Ala Leu Gin Arg Asp Leu Gly Leu Thr Pro 
■"U -145 -140 

Leu s Glu Ala Asp Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu 

~130 -125 

-i2o Asp 6lu Ala Ala A ?i«: Ala Ala Ala Gly Ala T v r Gl y G ""y 

-115 -110 

ser val Phe Asp Thr Glu Thr Leu Glu Leu Thr val Leu val Thr Asp 
3 -100 -95 _9Q 

Ala Ala ser val Glu Ala Val Glu Ala Thr Gly Ala Gly Thr Glu Leu 
-85 -80 -75 

Val ser Tyr Gly lie Glu Gly Leu Asp Glu He lie Gin Asp Leu Asn 
~'° -65 -60 

Ala Ala Asp Ala Val Pro Gly val val Gly Trp Tyr pro Asp val Ala 

-50 -45 

Gly Asp Thr val val Leu Glu val Leu Glu Gly Ser Gly Ala Asp val 

Ser Gly Leu Leu Ala Asp Ala Gly val Asp Ala Ser Ala val Glu val 

~ zo -15 _10 
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Thr ser ser Ala Gin Pro Glu Leu Tyr Ala Asp lie lie Gly Gly Leu 
-5 -11 5 

Ala Tyr Thr Met Gly Gly Arg Cys ser val Gly Phe Ala Ala Thr Asn 
10 15 20 

Ala Ala Gly Gin Pro Gly Phe val Thr Ala Gly His Cys Gly Arg val 
25 30 35 

Gly Thr Gin val ser He Gly Asn Gly Gin Gly val Phe Glu Gin ser 
40 45 50 55 

He Phe Pro Gly Asn Asp Ala Ala Phe val Arg Gly Thr ser Asn Phe 
60 65 70 

Thr Leu. Thr Asn Leu Val ser Arg Tyr Asn Thr Gly Gly Tyr Ala Thr 
75 80 85 

Val Ala Gly His Asn Gin Ala Pro He Gly Ser Ser Val cys Arg ser 
90 95 .100 

Gly ser Thr Thr Gly Trp His Cys Gly Thr lie Gin Ala Arg Gly Gin 
105 110 lis 

ser val ser Tyr Pro Glu Gly Thr Val Thr Asn Met Thr Arg Thr Thr 
120 125 130 135 

Val cys Ala Glu Pro Gly Asp ser Gly Gly ser Tyr He Ser Gly Asn 
140 145 150 

Gin Ala Gin Gly val Thr Ser Gly Gly ser Gly Asn cys Arg Thr Gly 
155 160 165 

Gly Thr Thr Phe Tyr Gin Glu val Thr Pro Met Val Asn Ser Trp Gly 
170 175 180 

Val Arg Leu Arg Thr 
185 

<210> 38 

<211> 43 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1346 

<400> 38 

gttcatcgat cgcatcggct gccaccggac cgctccccca gtc 



<210> 39 
<211> 38 
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<212> DNA 10495.020-DK.ST25.txt 

<213> Artificial sequence 

<220> 

<223> Primer 1602 
<400> 39 

gcggatccta ttaggtccgg agacggacgc cccaggag 38 

<210> 40 
<211> 1062 
<212> DNA 

<213> Nocardiopsis prasina DSM 15649 
<400> 40 

gccaccggac cactccccca gtcacccacc ccggaggccg acgccgtctc catgcaggag 60 

gcgctccagc gcgacctcgg cctgaccccg cttgaggccg atgaactgct ggccgcccag 120 

gacaccgcct tcgaggtcga cgaggccgcg gccgaggccg ccggtgacgc ctacggcggc 180 

tccgtcttcg acaccgagac cctggaactg accgtcctgg tcaccgactc cgccgcggtc 240 

gaggcggtgg aggccaccgg cgccgggacc gaactggtct cctacggcat cacgggcctc 300 

gacgagatcg tcgaggagct caacgccgcc gacgccgttc ccggcgtggt cggctggtac 360 

ccggacgtcg cgggtgacac cgtcgtgctg gaggtcctgg agggttccgg cgccgacgtg 420 

ggcggcctgc tcgccgacgc cggcgtggac gcctcggcgg tcgaggtgac caccaccgag 480 

cagcccgagc tgtacgccga catcatcggc ggtctggcct acaccatggg cggccgctgt 540 

tcggtcggct tcgcggccac caacgccgcc ggtcagcccg ggttcgtcac cgccggtcac 600 

tgtggccgcg tgggcaccca ggtgaccatc ggcaacggcc ggggcgtctt cgagcagtcc 660 

atcttcccgg gcaacgacgc cgccttcgtc cgcggaacgt ccaacttcac gctgaccaac 720 

ctggtcagcc gctacaacac cggcggctac gccaccgtcg ccggtcacaa ccaggcgccc 780 

atcggctcct ccgtctgccg ctccggctcc accaccggtt ggcactgcgg caccatccag 840 

gcccgcggcc agtcggtgag ctaccccgag ggcaccgtca ccaacatgac gcggaccacc 900 

gtgtgcgccg agcccggcga ctccggcggc tcctacatct ccggcaacca ggcccagggc 960 

gtcacctccg gcggctccgg caactgccgc accggcggga ccaccttcta ccaggaggtc 1020 

acccccatgg tgaactcctg gggcgtccgt ctccggacct aa 1062 

<210> 41 

<211> 353 

<212> PRT 

<213> Nocardiopsis prasina dsm 15649 
<220> 

<221> PROPEP 

<222> (1) . . (165) 

<220> 

<221> mat_peptide 
<222> (166>..C353) 

<400> 41 
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-165 Gly PTO LeU P i60 01 " Pr ° ™ r P i5S 61 " Ala ASp Ala 

-150 S6r M6t Gln Glu A 3? c Leu 6ln Ar 9 As P Leu 6 ">y Leu Thr Pro 
u —145 -140 

Le^ Glu Ala Asp Glu Leu Q Leu Ala Ala Gin Asp^ Thr Ala Phe Glu 

val Asp Glu Ala Ala Ala Glu Ala Ala Gly Asp Ala Tyr Gly Gly 
j-cv -115 -110 ' 

!ios val Phe Asp Thr -loo Thr Leu Glu Leu val Leu val Thr A ?e 



-90 



Ser Ala Ala val Glu Ala Val Glu Ala Thr Gly Ala Gly Thr Glu Leu 
-85 -80 -75 

val ser Tyr Gly He Thr Gly Leu Asp Glu He val Glu Glu Leu Asn 
~ /u -65 -60 

Ala Ala Asp Ala val Pro Gly val Val Gly Trp Tyr Pro Asp val Ala 

-50 -45 

Gly Asp Thr val val Leu Glu val Leu Glu Gly Ser Gly Ala Asp val 

-35 -30 

Gly Gly Leu Leu Ala Asp Ala Gly val Asp Ala ser Ala val Glu val 

-*0 -15 -10 

Thr Thr Thr Glu Gin Pro Glu Leu Tyr Ala Asp lie He Gly Gly Leu 

Ala Tyr Thr Met Gly Gly Arg gs ser Val Gly Phe Ala Ala Thr Asn 

Ala Ala Gly Gin Pro Gly Phe val Thr Ala Gly His cys Gly Arg val 

30 35 

Gly Thr Gin val Thr lie Gly Asn Gly Arg Gly val Phe Glu Gin ser 
w 45 50 55 

lie Phe Pro Gly Asn Asp Ala Ala Phe Val Arg Gly Thr ser Asn Phe 
oo 65 70 

Thr Leu Thr Asn Leu val ser Arg T^r Asn Thr Gly Gly T^r Ala Thr 

val Ala Gly His Asn Gin Ala Pro lie Gly ser ser val Cys Arg ser 

95 100 
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Gly f Sr Thr Thr Gl y Tr P His Cys Gly Thr He Gin Ala Arg Gly Gin 
105 HO 115 ' 

Ser val Ser Tyr Pro Glu Gly Thr val Thr Asn Met Thr Arg Thr Thr 
120 125 130 135 

Val cys Ala Glu Pro Gly Asp Ser Gly Gly Ser Tyr He Ser Gly Asn 
140 145 150 

Gin Ala Gin Gly val Thr ser Gly Gly Ser Gly Asn cys Arg Thr Gly 
155 160 165 

Gly Thr Thr Phe Tyr Gin Glu Val Thr pro Met val Asn Ser Trp Gly 
170 175 180 

# 

Val Arg Leu Arg Thr 
185 

<210> 42 

<211> 43 

<212> DNA 

<213> Artificial sequence 
<220> 

<223> Primer 1603 

<400> 42 

gttcatcgat cgcatcggct gccaccggac cactccccca gtc 

<210> 43 

<211> 353 

<212> prt 

<213> Nocardiopsis sp. nrrl 18262 
<220> 

<221> PROPEP 

<222> (1) . . (165) 

<220> 

<221> mat_peptide 
<222> (166)..C1059) 

<400> 43 

Ala Thr Gly Ala Leu pro Gin Ser pro Thr Pro Glu Ala Asp Ala 
-165 -160 -155 

val ser Met Gin Glu Ala Leu Gin Arg Asp Leu Asp Leu Thr Ser 
-150 -145 _ 140 

Ala Glu Ala Glu Glu Leu Leu Ala Ala Gin asd Thr Ala Phe Glu 
"I 35 -130 -125 

val Asp Glu Ala Ala Ala Glu Ala Ala Gly Asp Ala Tyr Gly Gly 
■ LZU -115 -110 

Page 25 



10495 . 020-DK . ST25 . txt 
-105 V ^ ASP ThP G i00 T 95 LeU Va1 Thr ASP 

Ala Ala Ala val Glu Ala Val Glu Ala Thr Gly Ala Gly Thr Glu Leu 
-85 -80 -75 

val ser Tyr Gly He Asp Gly Leu Asp Glu lie val Gin Glu Leu Asn 
-70 -65 -60 

Ala Ala Asp Ala val Pro Gly val val Gly Trp Tyr Pro Asp val Ala 
-55 -50 -45 

Gly Asp Thr Val val Leu Glu val Leu Glu Gly ser Gly Ala Asp val 
-40 -35 -30 

Ser Gly Leu Leu Ala Asp Ala Gly val Asp Ala Ser Ala val Glu val 
-25 -20 -is -io 

Thr Thr Ser Asp Gin Pro Glu Leu Tyr Ala Asp He lie Gly Gly Leu 

Ala Tyr Thr Met Gly Gly Arg cys Ser val Gly Phe Ala Ala Thr Asn 
10 15 20 

Ala Ala Gly Gin Pro Gly Phe val Thr Ala Gly His Cys Gly Arg Val 
25 30 35 

Gly Thr Gin val Thr lie Gly Asn Gly Arg Gly Val Phe Glu Gin ser 
40 45 50 55 

val Phe Pro Gly Asn Asp Ala Ala Phe val Arg Gly Thr ser Asn Phe 
60 65 70 

Thr Leu Thr Asn Leu val Ser Arg Tyr Asn Thr Gly Gly Tyr Ala Thr 
75 80 85 

val Ala Gly His Asn Gin Ala Pro lie Gly ser Ser val cys Arg ser 
90 95 100 

Gly ser Thr Thr Gly Trp His cys Gly Thr lie Gin Ala Arg Gly Gin 
105 110 115 J 

Ser val ser Tyr Pro Glu Gly Thr Val Thr Asn Met Thr Arg Thr Thr 
120 125 130 135 

Val Cys Ala Glu Pro Gly Asp ser Gly Gly Ser Tyr lie Ser Gly Thr 
140 145 150 

Gin Ala Gin Gly val Thr ser Gly Gly ser Gly Asn Cys Arg Thr Gly 
"5 160 165 
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Gly Thr Thr Phe Tyr Gin Glu val Thr Pro Met Val Asn ser Tro Glv 
170 175 180 K ' 

val Arg Leu Arg Thr 
185 

<210> 44 

<211> 1164 

<212> DNA 

<213> a rti ficial sequence 
<220> 

<223> Synthetic protease encoding gene 
<220> 

<221> CDS 

<ZZ3> Fun length protease 
<220> 

<221> sig_peptide 

<222> C15..C81) 

<220> 

<2 2 1> mis c_f eat u r e 

<222> C82)..(1164) 

<223> Propeptide 

<220> 

<221> mat_peptide 
<222> C577)..C1164) 

<400> 44 

m»? f 3 -* fSS i cg ? tg 9 9 a aaa a S* 9 tc 9 ca a 9 c aca gca ctt ctt 45 
Met Lys Lys Pro Leu cfy Lys He val Ala sir Thr Ala Leu Leu 

~ 19 ° -185 -180 



att tea gt< 



a tg gca ttt age tea tct att gca tea qca act aca aaa an 

He ser va? Ala Phe ser ser ser He Ala Ser All Ala Thr G?5 90 
~ 175 -170 -165 

gca tta ccg cag tct ccg aca ccg gaa gca qat aca ate tea at-o i« 
Ala Leu Pro Gin Ser Pro Thr Pro Glu Ala Sp Ala val llr Set " 
-loO -155 _ 150 

?i a ??! ctg S aa aga 9" ctt gat ctt aca tea gca gaa gca 180 
Gin Glu Ala Leu Gin Arg Asp Leu Asp Leu Thr ser Ala Glu All 

-140 _135 

2i a ctt . ctt g ct gca caa gat aca gca ttt gaa gtg gat qaa 225 
Glu Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu vat Asp Glu 

-125 -120 

gca gcg gca gaa gca gca gga gat gca tat aac aac tea art **r 
Ala Ala Ala Glu Ala Ala cTy Asp Xla Tyr G?y G?$ llr val Phe 

-U0 -105 

gat aca gaa tea ctt gaa ctt aca gtt ctt gtt aca aat aca aca oca *ir 
Asp Thr Glu ser Leu Glu Leu Thr val Leu val Thr Sp All Ala" Ala 
-100 -95 -go 

\£l rlu If 3 ?- aa ??! ?£ a §9 a gca gga aca gta ctt gtt tea tat 366 
val Glu Ala val Glu Ala Thr Gly Ala Gly Thr val Leu val ser Tyr 
-oi -80 -75 
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gga att gat ggc ctt gat gaa att gtt caa gaa ctg aat gca get gat 

6 iX 116 Asp Leu As f? Glu 11 e Val Gln 61 u L e u Asn Ala Ala Asp 
" 70 -65 -60 -55 

?« gtt ccg ggc gtt gtt ggc tgg tat ccg gat gtt get gga gat aca 462 
Ala val Pro GTy val val Gly Trp Tyr pro Asp val Ala Gly Asp Thr 
-50 -45 -40 

8*5 ? xt gaa g tt ctt gaa 99 a tca 9gc gca gat gtt tea ggc ctg 510 
val val Leu Glu val Leu Glu Gly ser G?y Ala Asp Val Ser Gly Leu 
-35 -30 -25 

ctg gca gac gca gga gtc gat gca tca gca gtt gaa gtt aca aca tca 558 
Leu Ala Asp Ala GTy val Asp Ala ser Ala val Glu val Thr Thr Ser 
-20 -15 -10 

gat caa ccg gaa ctt tat gca gat att att ggc ggc ctg gca tat tat 
Asp Gin Pro Glu Leu Tyr Ala Asp lie lie GTy Gly Leu Ala Tyr Tyr 

~ 5 -11 5 10 

* 

tit 2? c ?-? c aga £ gc agc g tt ggc t 7 :t Sea gca aca aat gca tca ggc 654 
Met Gly Gly Arg cys ser Val Gly Phe Ala Ala Thr Asn Ala ser Gly 
15 20 25 

caa ccg ggc ttt gtt aca gca ggc cat tgc ggc aca gtt ggc aca cca 702 
Gin pro Gly Phe val Thr Ala Gly His Cys Gly Thr val Gly Thr Pro 
30 35 40 

8J? 5 ca 2?* S? c aat ggc aaa 99° 9tt ttt gaa cga agc att ttt ccg 750 
val ser lie Gly Asn GTy Lys Gly val Phe Glu Arg ser He Phe Pro 
45 50 55 

ggc aat gat tea gca ttt gtt aga ggc aca tca aat ttt aca ctt aca 798 
Gly Asn Asp Ser Ala Phe val Arg Gly Thr Ser Asn Phe Thr Leu Thr 
60 65 70 

aat ctg gtt tea aga tat aat tca ggc ggc tat gca aca gtt gca ggc 846 
Asn Leu val ser Arg Tyr Asn Ser Gly Gly Tyr Ala Thr val Ala Gly 
75 80 85 90 

S?c III S3 S? c t<?a 9« gtt tgc aga tca ggc tca aca 894 

His Asn Gin Ala Pro He Gly ser Ala val Cys Arg Ser Gly ser Thr 
95 100 105 

aca ggc tgg cat tgc ggc aca att caa gca aga aat caa aca gtt agg 942 
Thr Gly Trp His cys GTy Thr He Gin Ala Arg Asn Gin Thr val Arg 
HO 115 120 

%*1 oS5 5 aa ?? c ^E a 9" tat agt ctg aca aga aca aca gtt tgt gca 990 
Tyr pro Gin GTy Thr Val Tyr ser Leu Thr Arg Thr Thr val cys Ala 
125 130 135 

gaa ccg ggc gat tca ggc ggc tca tat att agc ggc act caa gca caa 1038 
Glu Pro Gly Asp ser Gly Gly Ser Tyr He Ser Gly Thr Gin Ala Gin 
140 145 150 

9.? c ?tt aca tea ggc ggc tca ggc aat tgc agt get ggc ggc aca aca 1086 
Gly val Thr Ser Gly Gly ser Gly Asn Cys Ser Ala Gly Gly Thr Thr 
155 160 165 170 

tat tac caa gaa gtt aat ccg atg ctt agt tca tgg ggc ctt aca ctt 1134 
Tyr Tyr Gin Glu val Asn Pro Met Leu ser ser Trp Gly Leu Thr feu 
175 180 185 

aga aca caa teg cat gtt caa tec get cca 1164 
Arg Thr Gin ser His val Gin Ser Ala Pro 
190 195 
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<210> 45 
<211> 388 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> synthetic Construct 
<400> 45 

Met Lys Lys p,ro Leu Gly Lys He val Ala Ser Thr Ala Leu Leu 
-190 -185 -180 

He Ser val Ala Phe Ser ser ser He Ala ser Ala Ala Thr Gly 
-175 -170 -165 

Ala Leu Pro Gin Ser Pro Thr Pro Glu Ala Asp Ala val ser Met 
-160 -155 -150 

Gin Glu Ala Leu Gin Arg Asp Leu Asp Leu Thr Ser Ala Glu Ala 
-145 -140 -135 

Glu Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu val Asp Glu 
-130 -i$5 -120 

Ala Ala Ala Glu Ala Ala Gly Asp Ala Tyr Gly Gly ser val Phe 
-115 -110 -105 

Asp Thr Glu Ser Leu Glu Leu Thr val Leu val Thr Asp Ala Ala Ala 
-100 -95 -90 

val Glu Ala val Glu Ala Thr Gly Ala Gly Thr Val Leu val Ser Tyr 
-85 -80 -75 

Gly He Asp Gly Leu Asp Glu lie val Gin Glu Leu Asn Ala Ala Asp 
-70 -65 -60 -55 

Ala val Pro Gly Val Val Gly Trp Tyr Pro Asp Val Ala Gly Asp Thr 
-50 -45 -40 

Val val Leu Glu val Leu Glu Gly Ser Gly Ala Asp val ser Gly Leu 

Leu Ala Asp Ala Gly val Asp Ala ser Ala val Glu Val Thr Thr ser 
-20 -15 -10 

Asp Gin Pro Glu Leu Tyr Ala Asp lie lie Gly Gly Leu Ala Tyr Tyr 
-5 -11 5 10 

Met Gly Gly Arg cys ser val Gly phe Ala Ala Thr Asn Ala Ser Gly 
15 20 25 
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Gin Pro Gly phe Val Thr Ala Gly His Cys Gly Thr Val Gly Thr Pro 
30 35 40 

val ser lie Gly Asn Gly Lys Gly val Phe Glu Arg ser lie Phe pro 
45 50 55 

Gly Asn Asp ser Ala Phe val Arg Gly Thr ser Asn Phe Thr Leu Thr 
60 65 70 

Asn Leu val ser Arg Tyr Asn ser Gly Gly Tyr Ala Thr val Ala Gly 
75 80 85 90 

His Asn Gin Ala Pro lie Gly Ser Ala val cys Arg ser Gly Ser Thr 
95 100 105 

fhr Gly Trp His cys Gly Thr He Gin Ala Arg Asn Gin Thr val Arg 
HO H5 * 120 

Tyr Pro Gin Gly Thr Val Tyr ser Leu Thr Arg Thr Thr Val cys Ala 
125 130 ~~ 135 

Glu Pro Gly Asp ser Gly Gly Ser Tyr lie Ser Gly Thr Gin Ala Gin 
140 145 150 

Gly val Thr Ser Gly Gly Ser Gly Asn cys Ser Ala Gly Gly Thr Thr 
155 160 165 170 

Tyr Tyr Gin Glu val Asn Pro Met Leu Ser ser Trp Gly Leu Thr Leu 
175 180 185 • 

Arg Thr Gin ser His val Gin ser Ala pro 
190 195 

<210> 46 
<211> 165 
<212> prt 

<213> Artificial sequence 
<220> 

<223> Shuffled propeptide 0-2.19 
<220> 

<221> PROPEP 
<222> (1) . . C165D 

<400> 46 

Ala Thr Gly Ala Leu Pro Gin Ser Pro Thr Pro Glu Ala Asp Ala val 
1 5 10 K 15 

ser Met Gin Glu Ala Leu Gin Arg Asp Leu Asp Leu Thr Ser Ala Glu 
20 25 30 

Ala Glu Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu Val Asp Glu 
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35 40 45 

Ala Ala Ala Ala Ala Ala Gly Asp Ala Tyr Gly Gly Ser val Phe Asp 
50 55 60 

Thr Glu ser Leu Thr Leu Thr val Leu val Thr Asp Ala ser Ala val 
65 70 75 80 

Glu Ala val Glu Ala Ala .Gly Ala Glu Ala Lys Val val ser His Gly 
,85 90 95 

Met Glu Gly Leu Glu Glu lie Val Ala Asp Leu Asn Ala Ala Asp Ala 
100 105 110 

Gin Pro Gly Val Val Gly Trp Tyr Pro Asp lie His Ser Asp Thr Val 
115 120 125 

val Leu Glu val'Leu Glu Gly ser Gly Ala Asp Val Asp Ser Leu Leu 
130 135 140 

Ala Asp Ala Gly val Asp Ala ser Ala val Glu val Thr Thr ser Asp 
145 150 155 160 

Gin Pro Glu Leu Tyr 
165 

<210> 47 
<211> 166 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> Shuffled propeptide G-2.73 
<220> 

<221> PROPEP 
<222> (ID . . C166) 

<400> 47 

Ala Thr Gly Ala Leu Pro Gin Ser Pro Thr Pro Glu Ala Asp Ala val 
1 5 10 15 

Ser Met Gin Glu Ala Leu Gin Arg Asp Leu Asp Leu ser ser Ala Glu 
20 25 30 

Ala Glu Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu val Asp Glu 
35 40 45 

Ala Ala Ala Gly Ala Ala Gly Asp Ala Tyr Gly Gly ser val Phe Asp 
50 55 60 

Thr Glu Thr Leu Glu Leu Thr val Leu val Thr Asp Ala ser Ala val 
65 70 75 80 
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Glu Ala val Glu Ala Ala Gly Ala Glu Ala Lys Val val ser His Gly 
85 90 95 

Met Glu Gly Leu Glu Glu He Val Ala Asp Leu Asn Ala Ala Asp Ala 
100 105 110 

Gin Pro Gly Val Val Gly Trp Tyr pro Asp lie His Ser Asp Thr Val 
115 120 125 

4 

val yal Glu Val Leu Glu Gly ser Gly Ala Asp val Asp ser Leu Leu 
130 135 140 

Ala Asp Ala Gly val Asp Thr Ala Asp Val Lys val Glu ser Thr Thr 
V* 5 150 15S 



155 160 



Glu Gin pro Glu Leu Tyr 
165 

<210> 48 
<211> 166 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> Shuffled propeptide G-1.43 
<220> 

<221> PROPEP 
<222> (1) . . (166) 

<400> 48 

Ala Thr Gly Ala Leu Pro Gin ser pro Thr pro Glu Ala Asp Ala val 
1 5 10 K 15 

ser Met Gin Glu Ala Leu Gin Arg Asp Leu Gly Leu ser ser ser Gin 
20 25 30 

Ala Glu Glu Leu Leu Asp Ala Gin Ala Glu Ser Phe Glu lie asp Glu 
35 40 45 

Ala Ala Ala Ala Ala Ala Gly Asp Ala Tyr Gly Gly ser lie Phe Asp 
50 55 60 

Thr Asp ser Leu Thr Leu Thr Val Leu val Thr Asp Ala ser Ala val 
65 70 75 80 

Glu Ala val Glu Ala Ala Gly Ala Glu Ala Lys val val ser His Gly 
85 90 95 

Met Glu Gly Leu Glu Glu He val Ala Asp Leu Asn Ala Ala Asp Ala 
100 105 110 
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Gin Pro Gly Val Val Gly Trp Tyr Pro Asp He His Ser Asp Thr Val 
115 120 125 

val Leu Glu val Leu Glu Gly ser Gly Ala Asp val Asp ser Leu Leu 
W 135 140 

Ala Asp Ala Gly val Asp Thr Ala Asp Val Lys val Glu Ser Thr Thr 
1,3 150 155 160 



160 

Glu Gin Pro Glu Leu Tyr 
165 

<210> 49 
<211> 166 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> Shuffled propeptide G-2.6 
<400> 49 

Ala Thr Gly Ala Leu Pro Gin ser Pro Thr pro Glu Ala Asp Ala Val 
1 5 10 15 

ser Met Gin Glu Ala Leu Gin Arg Asp Leu Asp Leu Thr Ser Ala Glu 
20 25 30 

Ala Glu Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu. val asp Glu 
35 40 45 

Ala Ala Ala Ala Ala Ala Gly Asp Ala Tyr Gly Gly Ser He Phe Asp 
5" 55 60 

Thr Glu Thr Leu Glu Leu Thr val Leu val Thr Asp Ser ser ser val 
65 70 75 80 

Glu Ala val Glu Ala Ala Gly Ala Glu Ala Lys Val val Ser His Gly 
85 90 95 

Met Glu Gly Leu Glu Glu He val Ala Asp Leu Asn Ala Ala Asp Ala 
100 105 no 

Gin Pro Gly Val val Gly Trp Tyr Pro Asp lie His Ser Asp Thr Val 
115 120 12 S 

val Leu Glu val Leu Glu Gly ser Gly Ala Asp val Asp ser Leu Leu 
130 135 140 

Ala Gly Ala Gly val Asp Thr Ala Asp val Lys val Glu Ser Thr Thr 
145 150 155 160 

Glu Gin Pro Glu Leu Tyr 
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165 

<210> 50 
<211> 165 
<212> PRT 

<213> Artificial sequence 
' <220> 

<223> Shuffled propeptide G-2.5 
<220> 

<221> PROPEP 
<222> CD • • (165) 

<400> 50 

Ala Thr Gly Ala Leu Pro Gin ser pro Thr Pro Glu Ala Asp Ala val 
* 5 10 15 

Ser Met Gin Glu Ala Leu Gin Arg Asp Leu Gly Leu Thr Pro Leu Glu 
20 25 30 

Ala Glu Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu Val Asp Glu 
35 40 45 

Ala Ala Ala Glu Ala Ala Gly Asp Ala Tyr Gly Gly Ser val Phe Asp 
j0 55 60 

Thr Glu Thr Leu Glu Leu Thr val Leu val Thr Asp Ala Ser Ala val 
65 70 75 80 

Glu Ala val Glu Ala Ala Gly Ala Glu Ala Lys val Val Ser His Gly 
85 90 95 

Met Glu Gly Leu Glu Glu lie val Ala Asp Leu Asn Ala Ala Asp Ala 
100 105 110 

Gin Pro Gly val val Gly Trp Tyr Pro Asp lie His ser Asp Thr val 
115 120 125 

val Leu Glu val Leu Glu Gly ser Gly Ala Asp val Asp ser Leu Leu 
130 135 140 

Ala Asp Ala Gly val Asp Ala ser Ala val Glu val Thr pro Ala Ala 
145 150 155 160 

Arg pro Glu Leu Tyr 
165 

<210> 51 
<211> 166 
<212> PRT 

<213> Artificial sequence 
<220> 
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<223> Shuffled propeptide g-2.3 

<220> 

<221> PROPEP 
<222> CD..C166) 

<400> 51 

Ala Thr Gly Ala Leu Pro Gin Ser Pro Thr Pro Asp Gly Ala Glu Ala 
5 10 15 

Thr Thr Met val Glu Ala Leu Gin Arg Asp Leu Gly Leu Thr Pro Ala 

Glu Ala Glu Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu val Asp 
3J 40 45 r 

* 

Glu Ala Ala Ala Ala Ala Ala Gly Asp Ala Tyr Gly Gly ser He Phe 
3U 55 60 

Asp Thr Asp ser Leu Thr Leu Thr val Leu val Thr Asp Ala Ala Ala 

70 75 80 

Val Glu Ala val Glu Ala Ala Gly Ala Glu Ala Lys val val Ser His 
85 90 95 

Gly Met Glu Gly Leu Glu Glu lie Val Ala Asp Leu Asn Ala Ala Asp 
■ LUU 105 110 

Ala val Pro Gly val val Gly Trg Tyr Pro Asp val Ala Gly Asp Thr 

val val Leu Glu val Leu Glu Gly ser Gly Ala Asp val Tyr ser Leu 

135 140 

Leu Ala Asp Ala Gly yal Asp Ala ser Ala val Glu val Thr pro Ala 
w 150 155 160 

Ala Gin Pro Glu Leu Tyr 
165 

<210> 52 
<211> 166 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> Shuffled propeptide G-1.4 
<220> 

<221> PROPEP 
<222> (1) . . (166) 

<400> 52 
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Ala Thr Gly Ala Leu Pro Gin ser Pro Thr Pro Glu Ala Asp Ala val 

10 ^5 

ser Met Gin Glu Ala Leu Gin Arg Asp Leu Gly Leu ser ser Ser Gin 
20 25 30 

Ala Glu Glu Leu Leu Asp Ala Gin Ala Glu ser Phe Glu He Asp Glu 
" 40 45 

Ala Ala Ala Ala Ala Ala Ala Asp ser Tyr Gly Gly ser lie Phe Asp 

55 60 

Thr Asp ser Leu Thr Leu Thr val Leu Val Thr Asp Ala ser Ala val 

70 75 80 

Glu Ala val Glu Ala Ala Gly Ala Glu Ala Lys val val ser His Gly 
85 90 95 7 

Met Glu Gly Leu Glu Glu He val Ala Asp Leu Asn Ala Ala Asp Ala 
■ LUU 105 HO 

Gin Pro Gly val val Gly Trp Tyr Pro Asp He His Ser Asp Thr val 
X1 > 120 125 

val Leu Glu val Leu Glu Gly ser Gly Ala Asp val Asp Ser Leu Leu 

— w ±3 5 140 

Ala Asp Ala Gly val Asp Thr Ala Asp val Lys val Glu ser Thr Thr 
±H:> 150 155 160 

Glu Gin pro Glu Leu Tyr 
165 

<210> 53 
<211> 166 
<212> PRT 

<213> Artificial sequence 
<220> 

<223> shuffled propeptide G-1.2 
<220> 

<221> PROPEP 
<222> CD . . C166) 

<400> 53 

Ala Thr Gly Ala Leu Pro Gin ser Pro Thr Pro Glu Ala Asp Ala Val 
x 5 10 15 

Ser Met Gin Glu Ala Leu Gin Arg Asp Leu Asp Leu Thr ser Ala Glu 
20 25 30 

Ala Glu Glu Leu Leu Ala Ala Gin Asp Thr Ala Phe Glu Val Asp Glu 
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35 40 45 

Ala Ala Ala Ala Ala Ala Gly Asp' Ala Tyr Gly Gly ser He Phe Asp 
5U 55 60 

Thr Glu Thr Leu Glu Leu Thr Val Leu val Thr Asp Ser ser ser val 
65 70 75 80 

Glu Ala val Glu Ala Ala Gly Ala Glu Ala Lys Val val ser His Gly 
. 85 90 95 

Met Glu Gly Leu Glu Glu He val Ala Asp Leu Asn Ala Ala Asp Ala 
100 105 no 

Gin pro Gly val val Gly Trp Tyr Pro Asp He His Ser Asp Thr val 
115 120 125 

val Leu Glu val Leu Glu Gly Ser Gly Ala Asp val Asp ser Leu Leu 
A3U 135 140 

Al| Gly Ala Gly val Asg Thr Ala Asp val Lyj Val Glu ser Thr Thr 



Glu Gin Pro Glu Leu Tyr 
165 
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